A Study On Spark-based Distributed Collaborative Filtering And Its Tools

Posted on:2018-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhao

Full Text:PDF

GTID:2348330512998640

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Mobile Internet and Internet of Things,the amount of data collected by human beings increases exponentially.Distributed computing has become an indispensable key technology in the process of large data processing and analysis.Through decomposing complex tasks into multiple subproblems that can be executed concurrently on multiple interconnnected nodes,distributed computing solves the single bottleneck,difficult to extend problems of traditional algorithms.Thus,the research on distributed machine learning algorithms has become the focus of research in industry and academia.Among many distributed computing frameworks,Spark is widely used because of its high tolerance,high scalability and ease of use.However,the analysis and com-parison of the complexity of the distributed algorithms still lack an unified framework.Therefore,the analysis and comparison of the scalability and performance of specific algorithms on the Spark platform can only be done empirically.Based on the research of Spark distributed platform,this paper proposes a frame-work for analyzing the complexity of distributed algorithm on Spark,and using the collaborative filtering algorithm based on Spark as the application scenario.It turns out that the framework can effectively guide the algorithm development and runtime environment configuration.Specifically,the following work is done in this paper:Firstly,this paper introduces distributed computing and collaborative filtering tech-nology.The distributed computing section gives a detailed account of the comput-ing model,operation model and design concept of the popular Hadoop and Spark dis-tributed computing platform.The analysis and explanation of their principles are algo given.In the collaborative filtering part,the collaborative filtering based on memory and the collaborative filtering technology based on matrix decomposition are analyzed,and a variety of classical algorithms are introduced.Then,this paper proposes a complexity analysis framework for distributed algo-rithm on Spark.A variety of Spark distributed collaborative filtering algorithm have been analyzed based on that work.Finally,this paper designs a data mining toolbox based on Spark.The toolkit solves the problem that the analyst is difficult to use Spark by configuring the data mining algorithm and providing a configuration based data analysis application devel-opment model.Using this toolbox,users can easily use various distributed data mining algorithms to process large amounts of data without programming ability.In this paper,the function and development process of the toolbox are introduced in detail.

Keywords/Search Tags:

Spark, Collaborative Filtering, Distributed Computing

PDF Full Text Request

Related items

1	The Research And Implementation Of Mining Large Data Based On Spark
2	Enhanced Singular Collaborative Filtering Based Recommender System On Apache Spark
3	Research On Collaborative Filtering Recommendation Algorithm Based On Spark And System Implementation
4	Scalable Solution Of Collaborative Filtering Algorithm Based On Dimension And Distributed Computing
5	Research On Improved Distributed Collaborative Filtering Recommendation Algorithm
6	Research On Hierarchical Collaborative Filtering Algorithm With Spark Platform
7	Research On Context-Aware Information Collaborative Filtering Recommendation Algorithm Based On Spark
8	Single-source SimRank Computating And Its Application In Collaborative Filtering
9	The Research On Distributed Collaborative Filtering Algorithm
10	The Research And Application Of Collaborative Filtering Algorithm Based On Spark