Font Size: a A A

Research On Personalization Techniques With Their Applications To Digital Library

Posted on:2010-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1118330332978370Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, several mass digitization projects of books around the world have made a great progress. Personalization techniques in the context of large-scale digital library have been an important direction to explore. The author has fully participated in the first phase construction of China-America Digital Academic Library (CADAL), as a senior technician in charge of the development of CADAL portal, especially the recommendation and search applications for million books in CADAL.This thesis focuses on the research of personalization techniques and their applications to million e-books in CADAL. With respect to single-criterion and multi-criteria recommender systems, we propose new collaborative filtering methods in a perspective of probabilistic graphical model. However, most of users of CADAL portal have been reluctant to provide explicit ratings for books of interest, so that ratings-based recommender systems didn't work well in CADAL portal. Hence, the click-through logs of books have been utilized to real-timely recommend relevant books by using sequential pattern mining. Moreover, visitors can customize multimedia rules when visiting their own personal space in CADAL portal, according to which the appropriate recommendations are timely delivered by the rule-based recommender system. With respect to book search applications, the user-friendly HCI interfaces are the major concerns. The main contributions of this thesis are:(1) We propose an effective absorbing random walk model for single-criterion recommender systems. The single-criterion ratings data set is first transformed into a bipartite graph, each node in which is connected to a dummy node. Under the constraint of this augmented bipartite graph, the top-N recommendation task is modeled into a graph-based semi-supervised learning problem by employing the Gaussian random field, from which we derive an effective absorbing random walk model, taking into account the degree of each node. Experimental results upon two real-world ratings data sets show the effectiveness of our proposed model.(2) We propose two multi-criteria probabilistic latent semantic analysis models for multi-criteria recommender systems. The notable probabilistic latent semantic analysis models (pLSA) are extended to deal with multi-criteria ratings. The same latent variable introduced in pLSA is kept in the multi-criteria pLSA, corresponding to user group. However, two different multi-variate probability distributions are utilized to model multi-criteria ratings of each user. Experimental results on Yahoo!Movies multi-criteria ratings data set show that two multi-criteria pLSA models significantly outperform corresponding single-criterion pLSA and other examined methods.(3) Chapter 5 introduces the real-time book recommendation service based on the compact navigation pattern tree indexed by the red-black header tree, in which the prefix tree structure is used to incrementally handle the growth of access logs. The use of the red-black header tree greatly improves the scalability of compact navigation pattern tree. We propose the corresponding construction algorithm and the divide-and-conquer real-time recommendation algorithm based on this scalable navigation pattern tree. Experimental results on the click-through logs in the CADAL portal show the effectiveness and high scalability of our proposed approach.(4) Chapter 6 introduces search services for million books and personal space in the CADAL portal. We developed a novel user-friendly HCI interface for the metadata-based book search service. Moreover, we developed a book chapter search application supporting query expansion and exploratory search. In the personal space, we developed the multimedia rule-based recommender system, in which users can customize three kinds of multimedia rules:book, image and calligraphy character. The rule-based recommender system actively pushes appropriate contents to users according to content similarities as well as collective wisdom mined from logs or user feedbacks.
Keywords/Search Tags:Personalization, Digital Library, Recommender System, Book Search, Random Field, Absorbing Random Walk, Mixture Model, Sequential Pattern Mining
PDF Full Text Request
Related items