Font Size: a A A

The Research And Application Of Collaborative Filtering Algorithm Based On Spark

Posted on:2017-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2428330566953041Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and wide application of Internet technology,e-commerce will gradually become mature,and the proportion of online shopping is more and more large.However,the growing variety of goods,from which to choose their real needs also become increasingly difficult.The generation of electronic business platform recommendation system to a certain extent,solve the problem of information overload.Based on user behavior portraits or historical data to provide users with personalized product recommendations,can effectively avoid some screening work.Collaborative filtering recommendation algorithm is the most widely used and more mature recommendation algorithm in the E-commerce recommendation system.It has achieved certain results in practical applications,but it still has some defects,such as data sparsity,implicit feedback,cold start,accuracy and so on.In addition,facing the current situation of the user quantity and the quantity of the goods,the original recommendation algorithm gradually exposes the problems such as the extension,the stability and the real time.Based on the above factors,through studying the results of collaborative filtering recommendation algorithm at home and abroad and the common problems existing in the algorithm,this paper has improved item-based collaborative filtering algorithm,including the implicit feedback problem solution of in the data preprocessing stage and the improvement of the comprehensive similarity based on the user's historical behavior and the item description information in the similarity calculation stage.In combination with the current popular big data processing technology,the improved parallel algorithm is implemented on the distributed computing platform Spark,and finally the parallel algorithm is applied to the practical application.The main work and research results of this paper are as follows:(1)Facing the small number of users directly ratings in the e-commerce system,using the historical data of user behavior,through certain methods the implicit feedback data to standard user-item rating matrix,to solve in the user's implicit feedback in the electronic business system.(2)The introduction of a comprehensive similarity,it is composed of the similarity between the item and the similarity based on the description of the goods.Using the description of the goods,and the text information to the word segmentation,to join the stop word and custom word,it will be converted into TF-IDF vector,in order to calculate the content of the project similarity.Using the content similarity between items and adopting a certain way to modify the similarity between the traditional items.In this way,it can solve the problem of data sparsity and cold start,and improve the accuracy and recall rate of the recommended results.(3)In view of the scalability,stability and real-time of the algorithm,the improved algorithm is parallel to the Spark platform.Designing and implementing an electronic business platform recommendation system by the parallel algorithm.Finally,the population statistics method is used to optimize the user's cold start problem in the system.The experimental results show that the improved algorithm based on the comprehensive similarity has been improved in accuracy and recall rate.By comparing the execution time of the algorithm in the number of different nodes,it is verified that the Spark has a significant improvement in performance,which can effectively deal with the massive data and quickly get the recommended results.
Keywords/Search Tags:Collaborative filtering algorithm, Implicit feedback, Comprehensive similarity, Spark, Recommendation system
PDF Full Text Request
Related items