Font Size: a A A

Study On Improved Clustering Collaborative Filtering Algorithm Based On Hadoop

Posted on:2016-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:A N LiFull Text:PDF
GTID:2308330479984849Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As one of the most successful algorithms of Recommendation System(RS), Collaborative Filtering(CF) algorithm plays an important role to solve the Information Overload. Based on the historical scoring records and other preferences; CF algorithm computes the similarity between users or items to find the nearest neighbor set and predicts the ratings of users, using Collective Intelligence to provide recommendation service.At present, various CF algorithms have been proposed in the field of academic. These algorithms have greatly improved the performance of RS, and provided higher recommendation quality. But with the explosion of information and data in the Internet, we have entered a new era of big data. Volume, Variety, Velocity and Value are four characteristics of big data. They put hardware and software forward higher requirements. However, the sparsity and scalability problems of tranditional CF algorithms are amplified, so that RS can not provide effective recommendation service to users in big data.In order to improve the sparsity and scalability problems of CF algorithm in big data, this paper proposes an improved clustering CF algorithm based on hadoop. Firstly, we using ALS matrix factorization algorithm to fill high-dimensional sparse user-item matrix offline. Secondly, filled user-item matrix is clustered by improved item clustering algorithm. Then based on the clusters and similarities create the candidate set of recommendation. Finally, recommendations are accomplished online. To further improve the efficiency, the CF algorithm which this paper proposed takes the advantages of distributed computing, and be implemented on Hadoop platform. The matrix factorization procedure and item clusering procedure can be computed offline, so the RS provides online recommendation service more quickly. Using the user-movie datasets of Movie Lens to make experiments, the results show that this paper proposed CF algorithm can not only provides high recommendation quality, but also improve the sparsity and scalability problems of CF algorithm in big data.
Keywords/Search Tags:Collaborative filtering, Hadoop, Matrix factorization, Clustering
PDF Full Text Request
Related items