Font Size: a A A

The Research Of Recommendation System Based On Speark Platform

Posted on:2016-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z W YangFull Text:PDF
GTID:2298330470957785Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of modern Internet generates a large number of valuable information. How to find useful information from massive data is a project of great significance. The study and development of bid data platform rise right under this background. The birth of Hadoop has attracted attention to the study of the computing mode-MapReduce, while Spark suits the data mining scenario of big data by introducing RDD data model and computing mode based on memory. Spark performs better than Hadoop in iterative computations and soon becomes the research priority of vast enterprises and scholars. Recommendation system is an application which finds useful information from massive data of user behaviors and offers the information to users. The realization of recommendation algorithm in recommendation system is an important part of data mining. The realization process based on traditional computer is very time-consuming, thus cannot meet the needs of nowadays business. The parallelization combining distributed computing platform can effectively solve this problem. Besides, multiple iterative computations exist in the realization of recommendation algorithm and the rise of Spark right fulfills the parallelization need of recommendation algorithm.With a view to the current growing trend of various applications based on Spark platform at home and broad, this thesis will study on recommendation algorithm related technologies on the basis of Spark platform. It mainly includes the following aspects:(1) Study on the recommendation algorithm parallelization based on Spark platform. Based on the related technical research of Spark platform and recommendation system, firstly it designs the realization process of recommendation algorithm parallelization based on Spark platform and makes detailed analysis on the function of cluster nodes and task distribution after algorithm submission. Secondly it implements the recommendation algorithm parallelization based on Spark platform It mainly realizes the user-based collaborative filtering, item-based collaborative filtering and ALS-model-based recommendation algorithm and offers detailed parallelization realization process and makes analysis. Thirdly it analyzes in detail how Spark parallels data and task in the realization of algorithm through cases.(2) Optimization of parallelization based on Spark platform. The optimization mainly includes two aspects:platform optimization and recommendation algorithm optimization. In the realization of recommendation algorithm parallelization, when heterogeneity of Spark cluster nodes occur and unreasonable task scheduling exists, it proposes heterogeneous Spark cluster self-adaption task scheduling strategy-HSATS. In the recommendation algorithm optimization based on collaborative filtering, it proposes the vectorization of implicit tag property of user or item and integrates with similarity computation in the end. It designs a new loss function based on ALS model recommendation algorithm and integrates proximity information of users and items before model training.The experiment result indicates that Spark performs better than Hadoop in the realization of recommendation algorithm parallelization which needs multiple iterations. Under heterogeneous Spark clusters, HSATS self-adaptation task scheduling strategy could spend less time finishing tasks and the resource utilization of cluster nodes is more reasonable. This optimization plan of recommendation algorithm proposed improves the evaluating index of recommendation system.
Keywords/Search Tags:Big Data, Spark, Recommendation System, ParallelizationCollaborative Filtering
PDF Full Text Request
Related items