Font Size: a A A

Research And Implementation Of Collaborative Filting Recommendation Algorithm Based On Hadoop

Posted on:2016-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZhangFull Text:PDF
GTID:2298330452966430Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Personalized recommendation systems have been a hot topic in the field of commercial systems. Among them, collaborative filtering technology proposed by Goldberg et al has been widely recognized and applied in all kinds of personalized recommendation systems with its strong universality. With the continuous increase of the number of users and the number of goods in recommendation systems, the drawbacks of existing collaborative filtering algorithms, such as the cold start problem,the data sparsity problem, the scalability problem, the accuracy of the results and so on, have been exposed gradually. Recent studies show that, collaborative filtering algorithm based on distributed storage and distributed computing of data from users, can better adapt to the new environment requirements and can achieve better recommendation quality. But in the existing distributed collaborative filtering algorithm, distributed storage of user data caused that it is difficult to locate nearest neighbors for the target user, which has serious impact on the efficiency and accuracy of distributed collaborative filtering recommendation algorithm.Aiming at the distributed collaborative filtering algorithm, this paper carried out in-depth analysis and discussion,the main research contents are as follows:Based on the development history and the classification of collaborative filtering recommendation algorithm, the thesis summarized the shortcomings and deficiencies of existing single mode filtering technology and distributed collaborative technology. Then in order to weaken or overcome these drawbacks, this paper combined the idea of distributed computing of Hadoop, and put forward a kind of distributed collaborative filtering recommendation algorithm based Hadoop, and gave the calculation process of the new algorithm. This new algorithm improved the effectiveness of the recommended technology from two dimensions. That is, we not only improved collaborative filtering recommendation algorithm by itself, but also used Hadoop to implement a kind of distributed collaborative filtering recommendation algorithm, which was exactly implemented by MapReduce model. The expensive implementation was divided into many small calculation process, each of which can be parallelled on different nodes. This algorithm employed efficient partition strategy to maximize the data locality and reduce communication cost, and controlled algorithm complexity to increase the computing power, so that it could also obtain good scalability on the large-scale data sets.Finally, the paper verified the effectiveness of the proposed distributed collaborative filtering algorithm through many experiments on the standard MovieLens data sets.Through the analysis of experimental results over the traditional algorithms,The new algorithm embodied greater recommendation accuracy and superiority.And then we optimized the data source of the above mentioned collaborative filtering algorithm through the distributed database HBase, which provides good support to sparse data storage,to further improve the performance and practicability of the new algorithm.
Keywords/Search Tags:Collaborative Filtering, Hadoop, MapReduce, MovieLens, HBase
PDF Full Text Request
Related items