Research And Implementation Of Distributed Machine Learning Platform On Spark

Posted on:2018-10-30

Degree:Master

Type:Thesis

Country:China

Candidate:R Xiang

Full Text:PDF

GTID:2428330512998182

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the widespread use of computers and networks in various industries,it is be-coming easier to collect and manage data.And these big data plays an important role in market analysis,decision-making,artificial intelligence and other fields.Machine learning is the common means of analyzing and mining big data and the core research direction of artificial intelligence.It is very important to construct a efficient big learn-ing platform.In this paper,we introduce a distributed machine learning platform called LIBBLE-Spark we design and implement.LIBBLE-Spark is a Spark based platform,it takes full advantage of the Spark's features-memory based computing.Applications on it can be deeply integrated into the Spark data processing pipeline,thereby reducing the data 10 time.LIBBLE-Spark contains the following three main works:1.Based on Spark,we implement some regression,classification,clustering al-gorithms.In the implementation of regression and classification algorithm,we use the SCOPE to optimize the model,for its logic and framework are more suitable for Spark.So,we can have the implementations converge fast and communicate less.At the same time,LIBBLE-Spark provides users with an interface for self-defined generalized lin-ear model so that users can achieve rapid optimization of the model freely with the optimizing cores.2.We make some optimizations on the implementations of sparse learning.For the learning of sparse model,we propose a trick called Lazy Shrinkage for L1 regu-larization,which effectively reduce the computational overhead on high-dimensional sparse data.3.For the production environment or the machines can be preempted,we pro-pose an synchronization protocol called Partial Synchronous Parallel(PSP)and have it implemented on the Spark.The PSP can effectively reduce the waiting time and accelerate the optimization process without affecting the convergence of the algorithm.

Keywords/Search Tags:

Machine Learning, Platform, Distributed, Spark, Sparse Learning

PDF Full Text Request

Related items

1	Research And Implementation Of Distributed Machine Learning Platform Based On Spark And Pu-learning
2	Design And Implementation Of Machine Learning Platform Based On Spark
3	Research And Implementation Of Distributed Machine Learning Algorithms Orchestration System For Big Data Processing
4	Research And Implementation Of Spark Application Performance Prediction Model Based On Machine Learning
5	Analysis And Research Of Machine Learning Model Based On Spark
6	Research And Implementation Of Unified Large Data Mining Service Platform Based On Spark MLlib
7	A High-Performance Chinese Distributed Computing System (CH-Spark)
8	Research And Implementation Of Drag And Drop Machine Learning Platform Based On Spark And Keras
9	Research On Distributed Manifold Learning Algorithm Based On Spark
10	Research Of Heterogeneous Multi-task Learning And Efficiency Of Task Grouping