Font Size: a A A

Collaborative Filtering Recommendation Algorithm On Data Sparsity Problem From Statistical Perspective

Posted on:2017-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhangFull Text:PDF
GTID:2348330485991645Subject:Statistics
Abstract/Summary:PDF Full Text Request
Along with the popularization of network and rapid development of electronic commerce, the information resource is increasing explosively, it becomes more and more difficult for users to find their favorite information or goods in the massive resources quickly and accurately. In order to solve the problem, we have prouduced the recomendation system.Recommendation algorithm is the core technology of the recomendation system. At present, collaborative filtering recommendation algorithm is the most widely used and most successful among the numerous recommendation algorithms.The realization of recomendation mainly depends on the score that users put on internet. However,in the practical application, Because the user data and project data is quite large,and the number of the score for the items that users have accessed is very limited, which leads to the serious data sparsity problem,which is one of the main reason that leads to the low accuracy of traditional collaborative filtering recommendation algorithm.This article attempts to research the collaborative filtering recommendation algorithm on the problem of data sparsity from the statistical point of view. The article realized simple recommendations based on descriptive statistics, and explored the effect of collaborative recommendation algorithm that the fill of statistics, clustering analysis and matrix decomposition method is applied.On the basis of a detailed analysis of the causes of data sparsity problem and the impact to collaborative recommendation approaches, this paper proposed the use of Statistics filling methods to improve the data sparseness problem, on this basis, used the K-Means clustering to cluster users and determined the number of user categories based on silhouette coefficient,each type of user missing rating is also filled by the rating statistics in the same category. In addition to the fixed value filling in missing rating, the paper also used the singular value decomposition(SVD) dimension reduction technology to realized score prediction,and using the prediction score to fill the original matrix,the formation of new user rating matrix is used to implement collaborative filtering recommendation. Finally, from the perspective correction recommendation process, the paper used weighted similarity calculation approach to improve the traditional similarity measure between users,and proposed a method to calculate the weighted similarity between users based on user preferences and user scores similarity. The paper used MovieLens datasets to assess the method described above and used the mean absolute deviation(MAE) to compared the different methods improving effect of recommendation algorithm, algorithmic process is implemented mainly by EXCEL, R auxiliary programming language. Experimental results show that the proposed method can certain extent alleviate the data sparsity, thereby improving the quality of recommendation. Statistics filling, cluster, similarity calculation belong in statistics-based approach,we considered the statistical methods were applied to the field of recommendation, should not only focus on complex models,and the basic statistical method was added to the study of recommend algorithms, which is possible can effectively solve problems faced by the recommendation algorithm. In the future, the statistical methods will be used for more applications to obtain more substantial development.
Keywords/Search Tags:statistics, collaborative filtering, recommendation algotithm, data sparsity
PDF Full Text Request
Related items