Font Size: a A A

Research On Distributed Collaborative Filtering Recommendation Algorithm Based On Hadoop

Posted on:2019-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:W C LiuFull Text:PDF
GTID:2438330548972597Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous innovation of Internet technology,the zero-cost dissemination of information on the Internet has become possible,resulting in huge amounts of data.The era of big data has come.The amount of data increases explosively,and the problem of "information overload" needs to be solved urgently.The traditional search engine model can no longer meet the needs of people for the diversification and individuation of information.Personalized recommendation system comes into being in accordance with the development of the times.Collaborative filtering recommendation,as one of the most widely used algorithms in personalized recommendation system,has developed very quickly,and has been widely concerned by scholars.However,collaborative filtering recommendation algorithm is used to deal with the operation and storage of large data sets.There are some problems such as poor scalability extremely sparse score data and cold start which greatly reduces the recommendation performance of personalized recommendation system.Aiming at the above problems,this paper mainly studies the distributed collaborative filtering recommendation algorithm based on Hadoop platform,improves the traditional collaborative filtering algorithm,designs and implements the improved collaborative filtering recommendation algorithm under the Hadoop platform.In order to overcome the shortcomings of traditional collaborative filtering recommendation algorithm,such as poor scalability,score data sparsity,new users and new items cold start,the main contents of this thesis are as follows:(1)Collaborative filtering algorithm based on neighborhood.In order to solve the disadvantages of the traditional collaborative filtering recommendation algorithm in single machine mode,the traditional user-based and project-based collaborative filtering recommendation algorithm is improved.Firstly,the cosine similarity calculation formula of the traditional collaborative filtering algorithm is improved.The complexity of similarity calculation between users and items is simplified.Secondly,the traditional collaborative filtering algorithm is optimized by using the idea of "inverted table" to reduce the time complexity of similarity calculation.Finally,the two improved algorithms are distributed designed and run on the Hadoop distributed computing platform using MovieLens dataset.In the experiment,the accuracy,recall rate and coverage rate are analyzed.The three aspects verify that the improved algorithm has better recommendation effect compared with the traditional algorithm.(2)A Model-based Collaborative filtering algorithm.Aiming at the defects of the traditional collaborative filtering algorithm,the following improvements are made: firstly,the clustering model is established,and the CK-Means clustering algorithm model is obtained by optimizing the K-Means clustering algorithm,which is widely used at present,using the Canopy algorithm.The model is used to cluster the score dataset so as to improve the data sparsity of the traditional collaborative filtering recommendation algorithm.The second is to use the score prediction algorithm to predict and fill the vacancy data in the original dataset based on Hadoop platform.Then get the complete user-score matrix.Thirdly,the UserCF and ItemCF based on the CK-Means clustering model are programmed by MapReduce,and then the improved algorithm is used in the Hadoop platform to run the complete user-item scoring matrix.This paper analyzes the advantages and disadvantages of the proposed User Model-CF recommendation algorithm and the Item Model-CF recommendation algorithm,as well as the fields that are good at recommendation.
Keywords/Search Tags:Collaborative Filtering, Extendibility, Sparsity, Cold Start, Hadoop, CK-Means
PDF Full Text Request
Related items