Font Size: a A A

Research On Parallel And Distributed Stochastic Learning Algorithms

Posted on:2021-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y ZhaoFull Text:PDF
GTID:1488306500467464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Stochastic learning has been widely used in many machine learning applications.The goal of stochastic learning algorithms is to learn an optimal parameter of the machine learning model iteratively.In each iteration,stochastic learning algorithms randomly select some training data,and then calculate a vector to update the model parameter.One of the typical algorithms is stochastic gradient descent.With the rapid growth of data,parallel stochastic learning with multi-core systems,and distributed stochastic learning with multi-machine systems have attracted much attention.Compared with traditional sequential stochastic learning algorithms,parallel and distributed stochastic learning algorithms can deal with more training data at the same time.Based on multi-core systems and multi-machine systems,this paper systematically studies stochastic learning in four aspects: parallel stochastic learning for convex problems,parallel stochastic learning for non-convex problems,distributed stochastic learning for convex problems and distributed stochastic learning for non-convex problems.The contributions in this paper are outlined as follows:· In the research of parallel stochastic learning for convex problems,although existing works empirically found that in some cases the lock-free strategy is much more efficient than other strategies with locks,no works have theoretically proved the convergence of stochastic learning algorithms with the lock-free strategy.In this paper,we propose a novel fast method called asynchronous SVRG(Asy SVRG).By constructing a sequential learning process which is equivalent to the asynchronous learning process in Asy SVRG,and adopting a diagonal matrix to describe the overwriting,we theoretically prove that Asy SVRG can achieve a linear convergence rate with the lock-free strategy.To the best of our knowledge,this is the first work on parallel stochastic learning that guarantees the linear convergence rate with the lock-free strategy in theory.Empirical results show that Asy SVRG can achieve a faster convergence rate than existing methods.· In the research of parallel stochastic learning for non-convex problems,no works have theoretically proved the convergence of stochastic learning methods with the lock-free strategy for non-convex problems.In this paper,we firstly generalize the lock-free property of the equivalent sequential learning process as mentioned in the analysis of Asy SVRG for convex problems.In particular,as long as the stochastic updating vectors calculated by threads are Lipschitz continuous,the difference between the read sequence and write sequence defined in the equivalent sequential learning process can be bounded by the second order momentum of the norm of the stochastic updating vectors.Benefiting from this property,we theoretically show that both Hogwild! and Asy SVRG can converge with the lock-free strategy for non-convex problems.To the best of our knowledge,this is the first work that theoretically proves the convergence of Hogwild! with the lock-free strategy.At the same time,both theoretical and empirical results show that Asy SVRG can achieve a faster convergence rate than Hogwild! on non-convex problems.· In the research of distributed stochastic learning for convex problems,the communication complexity of existing works relies on the conditional number of the objective functions.The larger the conditional number is,the higher the communication complexity will be.In machine learning applications,the conditional number is usually large.Hence,existing methods are inefficient.In this paper,we propose a novel distributed stochastic learning method,called scalable composite optimization for learning(SCOPE).SCOPE adopts the variance reduction technique and local learning strategy.We theoretically show that compared with existing distributed stochastic learning methods,SCOPE can significantly reduce the communication complexity which is independent on the conditional number.We also propose a novel distributed stochastic learning method,called proximal SCOPE(p SCOPE)for sparse learning with non-smooth regularization.The communication complexity of p SCOPE is independent on the conditional number.Furthermore,we theoretically show that both SCOPE and p SCOPE can achieve the linear convergence rate.Empirical results verify that both SCOPE and p SCOPE can significantly reduce training time when compared with existing methods.· In the research of distributed stochastic learning for non-convex problems,large batch training is a common strategy for reducing communication complexity.In those large batch training methods,many training tricks are proposed to avoid the drop of generalization ability.However,these training tricks lack theoretical guarantees.In this paper,we firstly prove the convergence of one existing stochastic learning method,called stochastic normalized gradient descent(SNGD),for non-convex problems and show that the batch size of SNGD is not limited to the smooth coefficient of the objective function,which is different from existing SGDbased stochastic learning methods.Based on SNGD,we propose a novel stochastic learning method,called stochastic normalized gradient descent with momentum(SNGM).SNGM is a momentum variant of SNGD.We theoretically show that SNGM can achieve the same convergence rate as momentum SGD(MSGD).Benefiting from the normalization of gradients,SNGM can adopt a larger batch size than MSGD so that under the distributed training setting,SNGM can reduce the communication complexity when compared with MSGD.Empirical results verify that compared with MSGD adopting a large batch size,SNGM with a large batch size can achieve better performance on test data.At the same time,compared with MSGD adopting a small batch size,SNGM with a large batch size can achieve the same performance on test data and reduce the number of communication rounds.
Keywords/Search Tags:stochastic learning, convergence rate, lock-free strategy, communication complexity
PDF Full Text Request
Related items