Font Size: a A A

Research On Collaborative Filtering Algorithm In E-Commence Recommender System

Posted on:2012-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:H M LiFull Text:PDF
GTID:2178330335450372Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The accelerating pace of informationization brings the birth of the modern productive forces-E-commerce, which consists of information technology, business technology and management techniques, is in an unprecedented period of development. And it is becoming a new high network turnover, promoting the rapid development of the sector economy, regional economy, national economy and world economy. The recommender system is an important module of e-commerce, which on the one hand, makes the buyers choose from a range of items and on the other hand, help the seller turn the buyers'latent demand into real buying behavior, so we can call e-commerce recommendation system as "virtual salesman". The technology of Collaborative Filtering can be divided into two types:content-Based filtering and collaborative filtering technology. Collaborative filtering technology is superior to content-Based filtering technology in the following three points:(1) The ability to filter any type of content, e.g. text, art work, music, mutual funds.(2) The ability to filter Based on complex and hard to represent concepts, such as taste and quality.(3) The ability to make serendipitous recommendations.Collaborative filtering technique is one of the most successful and the most widely used technology in e-commerce recommendation system. But with the increasing amount of data, it also exposed some problems, such as scarcity, the balance of the recommended accuracy and real time, cold start and scalability issues. In this paper, these problems were studied. The major contents follow bellow:(1) Research on the traditional collaborative filtering algorithms:User-Based collaborative filtering algorithm and the Item-Based collaborative filtering algorithm. Basic calculation steps can be divided into the following stages:the representation stage (turn a list of the user ratings into user-item rating matrix), the calculation of the users'(items') similarity, selecting the nearest neighbors and producing the recommended result.Also,the user-item rating matrix is a high-dimensional sparse matrix; in the respect of user (items) similarity computation, there mainly are three algorithm types:cosine similarity, modified cosine similarity, Pearson correlation similarity; in the respect of the nearest neighbor selection, there mainly are two methods:selecting the center neighbors and selecting the aggregate neighbors; in the respect of the recommended strategies, there are also two methods: the strategies Based on the weight and frequency.(2) In the core aspect of the collaborative filtering algorithm-the user (project) the calculation of similarity, using a combination of cloud model and K-means algorithm, the similarity calculation is improved. Backward cloud Algorithm turns a quantitative concept into the qualitative characteristics of the concept, Based on user ratings of the frequency vector of the item. Using the three digital features of the cloud (Expected value Ex, entropy En and hyper entropy He) to represent the user's interest in preference, this will formed a user rating feature vector. With the calculation of angle cosine of user score feature vectors instead of that of user ratings, you can consider the objects'similarity ignoring the details, including the object attribute information and the user similarity. In the respect of alleviating the traditional collaborative filtering algorithms'the scarcity, the definition of sparse problems and common solutions were given firstly. There mainly are two methods:the filling method (turn the sparse matrix into dense matrix) and with the premise of not changing the sparse matrix, improving the algorithm accuracy. Then to solve the problem of scarcity, I compare the advantages and disadvantages of the various methods. Then the- default-value-Based nearest neighbor algorithm and the SVD-Based nearest neighbor algorithm's steps were given. And then to introduce the PCA-Based collaborative filtering algorithms, the overview of the existing dimension reduction methods were given, and also the advantages and disadvantages of common dimension reduction methods were compared. Focuses on the PCA method (principal component analysis, Principal Component Analysis, referred to as PCA), including its mathematical model and algorithm steps of the geometric significance. On this basis, I proposed PCA-Based collaborative filtering algorithm, whose the basic steps of the algorithm: standardization the user-item rating matrix, forming the Pearson correlation matrix, principal component analysis (PCA) and recursive matrix clustering. Algorithm results in the follow-up experiments was confirmed-recommended quality was significantly improved.(3) I designed multiple sets of comparative experiments in order to achieve the results of the above algorithm. Experimental data set used is from Minnesota University of GroupLens Research project team collected MovieLens data set (http://MovieLens.umn.edu/).The data set contains 943 users on the 1682 items ratings, a total of nearly Hundred thousand records. The sparse level of this data set is 0.9369. Metrics used in the experiment is the average absolute deviation (Mean Absolute Error, MAE). And the smaller MAE is, the higher the quality of the recommended result. Firstly, as is the traditional collaborative filtering algorithm concerned, I compared the performance of the Item-Based collaborative filtering algorithm with the User-Based collaborative filtering algorithm in three standard similarities (cosine similarity, the modified cosine similarity and Pearson correlation similarity). Then I compared the Item-Based collaborative filtering algorithms with the User-Based collaborative filtering algorithm performance with the same similarity calculation method. The results show that no matter what kind of standards-Based similarity calculation, the Item-Based collaborative filtering algorithm's performance is better than the User-Based collaborative filtering algorithms'performance. Finally, I analyzed the other factors which impacted the experimental results:the training set and test set ratio and the number of the neighbors set. Experimental results show:the value of x (train/test set ratio) increases from 0.2 to 0.9,0.1 each time. Also the predicted quality improved. When x=0.8, the best recommended quality would be achieved. With the comprehensive consideration of all factors, the appropriate number of neighbors set was 30. For the cloud model, I compared the similarity calculation Based on cloud model's similarity with the other three standard calculations; we can see the similarity calculation Based on cloud model is better than other standards similarity calculation (as the MAE values decreased). Finally, I compared the results of the PCA-Based on collaborative filtering algorithms with the default-values Based collaborative filtering algorithm.
Keywords/Search Tags:Recommender System, E-commerce, Item-Based, User-Based, Dimension Reduction, PCA, MovieLens
PDF Full Text Request
Related items