Font Size: a A A

Study Of Statistical Forecasting In Recommender Systems

Posted on:2011-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:H K ZhuFull Text:PDF
GTID:2178360308452381Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the popularization of Internet and the development of E-Commerce, the amount of Internet information is rapidly expanding, resulting in more and more serious problems on information overload and information confusion. Under these circumstances, recommender systems as an effective method of information filtering have come into existing. By directly interacting with users and imitating vendors to provide product recommendation services, recommender systems help users to find goods that are really needed from vast amounts of commodity information. Recently, although recommender systems have achieved great progress on some key techniques theoretically and practically, but a number of challenges still stand in front of recommender systems, such as recommendation accuracy, the scarcity of relation matrices and so on.Collaborative filtering (CF) is the key technique in recommender systems. Meanwhile, it is also one of widely used recommendation techniques. As an important branch of CF technique, memory-based CF could be classified into two categories: user-based CF and item-based CF. For both user-based CF and item-based CF, their core issue is finding really similar neighbors. In this paper, some key techniques including various recommendation algorithms, design and architecture of recommender systems and so on will be investigated in detail. The contribution of this paper is mainly reflected in the following three aspects:Firstly, the popular similarity measures in the current CF recommender system will be analyzed theoretically. Meanwhile, for some specific similarity measures, the limitations will be investigated theoretically. At present, the most commonly used similarities in recommender system are cosine similarity, Pearson similarity, Euclidean distance-based similarity and so on. We theoretically analyze and compare the strengths and weaknesses as well as the similarity relationships between them to reveal their statistical essence.Secondly, two new similarity measures, mutual information (MI) based similarity and L1 distance-based similarity are proposed. MI-Based similarity is the result of applying statistics and information theory to recommendation system. Unlike Pearson's linear correlation coefficient which accounts only for linear relationships or other well-known rank correlation coefficients that just detect monotonic dependencies, MI-Based similarity takes into account all types of dependence. Therefore, MI-Based similarity could be utilized to roughly explore the correlation of two objects by computing the distance of two related probability distributions. L1 distance-based similarity derives from the combination of the famous L1 distance and the CF recommender systems. It has high sensitivity and easy-computing.Finally, a two-level CF recommendation framework based on hierarchical thought is constructed. In this framework, two-level information filtering is used to find really similar neighbors. The first layer of information filtering relies mainly on the MI-Based similarity, while the second layer depends on other similarities like cosine similarity, Pearson similarity, L1 distance-based similarity. Experiments show that the framework could improve the accuracy of recommendation.
Keywords/Search Tags:Recommendation, Collaborative filtering, Mutual information, L1 Distance, Hierarchical thought
PDF Full Text Request
Related items