Research On Distributed Adaptive Stochastic Gradient Descent Optimization Algorithms With Spark MLlib

Posted on:2019-11-22

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Fan

Full Text:PDF

GTID:2428330545476727

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Non-convex optimization problems are natural formulations in many machine learning problems(e.g.,(un)supervised learning,Bayesian learning).For optimization problems in machine learning and deep learning,Stochastic Gradient Descent(SGD)has become the de-facto iterative learning algorithm.Different variants of gradient descent algorithms have been proposed.However,none of them have considered the root cause of oscillaition when current training step overshoots the optimum.Distributed optimization methods have become a prerequisite as single machine cannot handle the rapidly growing data and model parameters.Unfortunately,traditional SGD is essentially serial,which makes it no longer applicable for large datasets.Therefor researchers have proposed a variety of distributed optimization algorithms.Apache Spark is a unified analytics engine for large-scale data processing,MLlib is Apache Spark's scalable machine learning library.However,the gradient needs to be synchronized once in every iteration in current implementation of MLlib SGD,which may lead to a very slow convergence rate.In addition,frequent parameter aggregation operations in MLlib SGD will introduce time-consuming shuffle operations when the dimension of model is high.In this paper,we propose a distributed adaptive stochastic gradient descent algorithm based on oscillation analysis,and integrate it with data-parallel MLlib SGD.We also propose an iterative optimization algorithm based on local search and a communication optimization algorithm based on parameter server to optimize the shortcomings of MLlib SGD's implementation.The primary contributions of this paper are highlighted as follows:(1)We propose a distributed adaptive gradient descent algorithm OAA-SGD which based on the analysis of the root causes of oscillations.In order to verify the effectiveness of the OAA-SGD,we adopt tool in Matlab to analyze the classification results and convergence behavior on logic regression benchmark on single machine.Experiments show that OAA-SGD achieves better classification results and faster convergence rate compared to existing methods.(2)We propose iterative optimization algorithm LS-SGD with local search to optimize the inefficient utilization of broadcast variables in implementation of MLlib SGD.LS-SGD adopts multiple round of local iterations on local data shards in each round of global iterations.Experimental results show that LS-SGD achieves a faster convergence rate than MLlib SGD on linear regression problem.Besides,the convergence property of LS-SGD is proved theoretically.(3)We propose a distributed adaptive SGD algorithm which based on OAA-SGD and LS-SGD to optimize the insufficient support in the orginal MLlib SGD's implementation.It combines LS-SGD algorithm with OAA-SGD algorithm,and leads to an effective control of the number of local iterations,as well as an adaptive adjustment of momentum term and learning rate in a distributed manner.(4)We propose OLP-SGD algorithm which based on parameter server to solve the single-point problem in MLlib SGD.We adopt parameter servers which based on spark to store,share and update parameters of our network model in a distributed manner.'experiments on linear regression dataset show that OLP-SGD algorithm achieves a 3?6 times speed-up ratio compared with MLlib SGD.Experiments on image classification problem show that OLP-SGD can achieve a good classification results which is not inferior to any existing algorithms.Furthermore,OLP-SGD algorithm also achieves a good node scalability.

Keywords/Search Tags:

Optimization Algorithm, SGD, Deep Learning, Spark MLlib

PDF Full Text Request

Related items

1	Research On P2P Traffic Classification Method Based On Pearson Coefficient Distance Weight KNN Algorithm
2	Research And Implementation Of Unified Large Data Mining Service Platform Based On Spark MLlib
3	Real Estate Appraisal System Based On Spark Mllib
4	The Sugarcane Price Prediction System Research Based On Spark MLlib Regression Algorithm
5	The Design And Implementation Of User Behavior Analysis System Based On Spark
6	Research And Implement Of Distributed Deep Learning System Based On Spark
7	Design And Implementation Of A User Behavior System For Query Logs Based On Spark
8	Image Retrieval Research Based On Spark And Deep Learning
9	Research On Optimization And Parallel Of K-means Algorithm On Spark
10	Research And Application Of Image Recognition Algorithm Based On Deep Learning