Font Size: a A A

Improvement Of Collaborative Filtering Recommendation Algorithm And Its Parallelization On Hadoop Platform

Posted on:2019-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:S Q WenFull Text:PDF
GTID:2428330566493541Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recommended technology is one of the most important technologies in ecommerce systems.Collaborative filtering recommendation is the most widely used recommendation algorithm currently.Although the traditional collaborative filtering algorithm has achieved intelligent recommendation,there are still problems such as low recommendation accuracy,insufficient personalization and low efficiency.In order to solve these problems,this paper proposes the collaborative filtering recommendation model based on the fusion of correlation-weighted method and item optimal-weighted method,the collaborative filtering recommendation algorithm based on user category preferences,the collaborative filtering recommendation algorithm based on the Naive Bayesian regression model and the parallel Naive Bayesian regression model deployed on Hadoop platform.Detailed study is as follows:First,traditional collaborative filtering algorithm has the problem that the recommendation result is not personalized enough,and the existing optimization method does not consider the influence of accuracy when improving the personalized degree.To solve these problems,this paper proposes a collaborative filtering recommendation model based on the fusion of correlation-weighted method and item optimal-weighted method.Firstly,using correlation weighting,the algorithm sets an optimal threshold to ensure the stability where the users share fewer scoring items.Secondly,to meet the optimization goal of minimizing average absolute error,the item's optimal weights are introduced in the step of predicting rates.Adopting PSO(Particle Swarm Optimization)algorithm optimizing the weighting of the item to reduce the impact of hot items and discovers unpopular items.The last step is to fuse the correlation-weighted method and item optimal-weighted method.The verification results on the MovieLens public dataset show that the algorithm can effectively improve the recommendation accuracy,coverage,recall rate and average popularity.Second,traditional collaborative filtering algorithm does not consider the user's category preferences,leading to the problem of low accuracy.The current improved algorithm using user preference,however,reduces efficiency when improving algorithm accuracy.To solve these problems,this paper proposes a collaborative filtering recommendation algorithm based on user category preferences.The algorithm divides items into user preference categories and non-preference categories according to the user's non-preference item ratio contained in each item attribute.The similarity is calculated only if there are items with the same category preferences as the score item to be predicted;otherwise,the similarity is not calculated.The verification results on the MovieLens public dataset show that the algorithm can improve the efficiency,accuracy and coverage of the algorithm at the same time.Third,the memory-based collaborative filtering algorithm has low computational efficiency and predictive success rate.In order to improve the efficiency and precision,this paper uses the Naive Bayes model for collaborative filtering recommendation.However,the common Naive Bayesian model has the disadvantage of difficulty to process continuous data.Therefore,this paper proposes a Naive Bayesian regression model based collaborative filtering recommendation algorithm.First,users and items are defined as independent attributes,and discrete score values are used as classification categories.Second,Naive Bayesian model is used to predict the probability of users and items in each scoring category.Finally,using the results of the classification for regression prediction,the user's expected rating of the item is treated as the prediction score.Experimental results on discrete Movielens datasets and continuous Jester datasets show that this method has greatly improved the success rate and efficiency of prediction,compared with traditional methods.Fourth,to further improve the efficiency of the algorithm and to deal with the application scenarios of “mass data”,the Naive Bayes regression model based collaborative filtering recommendation algorithm is parallelized on the Hadoop platform.Firstly,the parallelizability of Naive Bayes regression model is analyzed and the theoretical model of parallel Naive Bayes regression model is constructed.Secondly,the algorithm is implemented on Hadoop distributed file system and MapReduce framework.The experimental results on Netflix datasets show that the method has higher scalability and less time and space overhead,which improves efficiency without sacrificing the accuracy.
Keywords/Search Tags:Collaborative filtering, weighting, User category preferences, Naive Bayesian regression model, Hadoop parallelization
PDF Full Text Request
Related items