Font Size: a A A

Research And Implementation Of Hybrid Recommendation Algorithm Based On Spark Platform

Posted on:2017-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:W T ZhouFull Text:PDF
GTID:2308330485459827Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the exponential increase of information has haunted people for effectively extracting the useful information from all the available online information. Recommendation system is one of the most effective ways to solve this problem. As a branch of recommendation system, collaborative filtering is a classical and widely used technique, which can provide personalized recommendations based on users’requirements. However, it may suffer from a series of problems, such as data sparsity and scalability.The main reason for the sparsity problem is the lack of data. Meanwhile, the main concerns for the scalability problem are the parallelization of algorithms and the multi-cluster operations. To address these problems, in this thesis, we propose a novel hybrid recommendation algorithm using MovieLens dataset in Spark clusters. First, we employ the Co-Clustering with Augmented Matrix model to get user clusters and movie clusters simultaneously. Then the distances between each user (movie) and each user (movie) cluster are calculated based on K-L divergence. Cosine similarity method is used for getting users’(movies’) similarity, and proper users’(movies’) similar user (movie) collections are filtered from the appropriate user (movie) clusters. Next, based on the above similar collections, the user-based and movie-based methods are adopted, respectively, to obtain individual predictions for an unknown movie’s rating. Finally, we combine these two predictions via a linear combination for accuracy improvement.The experiment results show that the proposed hybrid recommendation algorithm based on Spark platform can improve the accuracy of recommendation on sparsity dataset. Besides, it has obvious advantages in speed and scalability, due to the lots of iteration processes.
Keywords/Search Tags:Recommendation System, Collaborative Filtering, Co-Clustering, Similarity Calculation, Spark Platform
PDF Full Text Request
Related items