Font Size: a A A

Research And Implementation Of Parallel Recommandation Algorithm Based On Spark

Posted on:2017-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhengFull Text:PDF
GTID:2308330485472129Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the integration and interaction between information technology and economic society lead to the rapid growth of data. It is clear that we have entered the era of big data. User can get extremely rich information, whereas one also has to spend a lot of time screening required information in face of massive data. The phenomenon of information overload is increasingly apparent. Recommended technology as a key technology to solve the information overload has been widely used in many fields like e-commerce, music and video site, but there are still issues to be solved. However, with massive data, traditional collaborative filtering algorithm encounter the scalability problem and can not meet computing needs of recommendation algorithm. Distributed parallel computing framework offers a solution to this problem. Spark with the advantages of its in-memory computing becomes a major focus in the study of data processing field in the past two years. Therefore, the realization of parallel recommendation algorithms in the Spark platform has important practical value to address the challenges of big data and to improve the efficiency of recommendation algorithm. Then design and implement improved recommendation algorithms on Spark.. The main work are as follows:(1) Analyze the principles and disadvantages of the Slope One and item-based collaborative filtering algorithm.(2) Aiming at the low prediction accuracy under the situation of sparse score data of item-based collaborative filtering algorithm, this thesis introduces the item attributes similarity, and the item attributes similarity and the score similarity have been processed linear combination while performing the similarity calculation for the item, so as to reduce the negative impact to similarity calculation when data is sparse. Experimental results show that the improved algorithm has increased the prediction accuracy when the data is sparse.(3) For Slope One algorithm, thesis introduces a new prediction method which combines the correlation between item and users and the similarity between items, to deal with the shortcomings of the Slope One, which simply relies on the same user score on different items without considering the similarity of the item and the user. Experimental resuIts show that the improved algorithm has increased the prediction accuracy when the data is relatively dense.(4) Improved item-based algorithm and Improved Slope One algorithm is paralleled step by step with Spark programming model. The parallel efficiency of the two parallel algorithms accomplished in this thesis has been validated with Speedup and Sizeup indicators. Experimental results show that the parallel algorithms in this thesis achieves good parallel effect and high efficiency while processing large data and solve the scalability problem.
Keywords/Search Tags:Spark, Parallelization, Slope One, Item-based collaborative filtering
PDF Full Text Request
Related items