Font Size: a A A

The Research And Implementation Of Collaborative Filtering Algorithm On Spark Platform

Posted on:2017-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:B W ZhengFull Text:PDF
GTID:2348330509461200Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the era of big data has arrived. How to derive valuable information from the massive and rich data resources of Internet has become an urgent problem. Personalized recommendation system is one of the effective ways to solve the problem of information overload. Recommender systems analyze patterns of user interest in items or products to provide recommendations for items and services that will suit a user's taste. In the research of traditional collaborative filtering algorithms, it takes a lot of time to process large-scale user behavior data on traditional single platform. Parallelizing the traditional collaborative filtering algorithms is an effective solution to this problem. Spark is a distributed computing framework based on memory, which specializes in iterative machine learning algorithms. Parallelizing the collaborative filtering algorithms that required iterative calculation on Spark platform will greatly enhance the efficiency of recommendation algorithms.In this paper, we analyze and implement several collaborative filtering algorithms based on Spark platform. Firstly, we introduce Spark platform and several common recommendation algorithms including the item-based collaborative filtering and latent factor model for collaborative filtering. For item-based algorithm, we implement correlation-based similarity and adjust cosine similarity and bias-based similarity based on Spark platform. By adding a penalty factor to rating prediction formula, we solve the problem of accuracy reduction when the neighborhood information is insufficient. As to latent factor model, we use alternating least squares to solve the matrix factorization problem. Besides, we establish bipartite graph model of user-item relation by Graph X component of Spark platform and update user factor matrix and item factor matrix alternately by graph-parallel computation, which greatly enhance the efficiency of the algorithm. Finally, this paper implements a linear blending of item-based algorithm with latent factor model and proposes a linear model to blend several collaborative filtering algorithms. Furthermore, we use least squares to calculate the weight of each algorithm in linear blended model so that the linear model can automatically gain the importance of each algorithm by training dataset. It is proved that the blended model can improve the accuracy of rating prediction.Multi-group comparison of experimental results on Movie Lens dataset show that, even computing on a single node, the running time of collaborative filtering algorithm based on alternating least squares which is implemented by Graph X is far less than traditional algorithm which is implemented on single platform. In addition, adding the penalty factor to rating prediction formula will greatly enhance the accuracy of item-based collaborative filtering algorithm. At last, the blended model this paper proposed can further improve the accuracy on the basis of existing algorithms and make the combinations of multiple collaborative filtering algorithms able to adapt to different situations.
Keywords/Search Tags:Collaborative filtering algorithm, Spark platform, Graph-parallel computation, Blended model
PDF Full Text Request
Related items