Font Size: a A A

Research And Implementation Of Distributed Machine Learning Platform On Spark

Posted on:2018-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:R XiangFull Text:PDF
GTID:2428330512998182Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread use of computers and networks in various industries,it is be-coming easier to collect and manage data.And these big data plays an important role in market analysis,decision-making,artificial intelligence and other fields.Machine learning is the common means of analyzing and mining big data and the core research direction of artificial intelligence.It is very important to construct a efficient big learn-ing platform.In this paper,we introduce a distributed machine learning platform called LIBBLE-Spark we design and implement.LIBBLE-Spark is a Spark based platform,it takes full advantage of the Spark's features-memory based computing.Applications on it can be deeply integrated into the Spark data processing pipeline,thereby reducing the data 10 time.LIBBLE-Spark contains the following three main works:1.Based on Spark,we implement some regression,classification,clustering al-gorithms.In the implementation of regression and classification algorithm,we use the SCOPE to optimize the model,for its logic and framework are more suitable for Spark.So,we can have the implementations converge fast and communicate less.At the same time,LIBBLE-Spark provides users with an interface for self-defined generalized lin-ear model so that users can achieve rapid optimization of the model freely with the optimizing cores.2.We make some optimizations on the implementations of sparse learning.For the learning of sparse model,we propose a trick called Lazy Shrinkage for L1 regu-larization,which effectively reduce the computational overhead on high-dimensional sparse data.3.For the production environment or the machines can be preempted,we pro-pose an synchronization protocol called Partial Synchronous Parallel(PSP)and have it implemented on the Spark.The PSP can effectively reduce the waiting time and accelerate the optimization process without affecting the convergence of the algorithm.
Keywords/Search Tags:Machine Learning, Platform, Distributed, Spark, Sparse Learning
PDF Full Text Request
Related items