Research On Distributed Optimization Methods For Large-Scale Machine Learning

Posted on:2022-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:X F Liang

Full Text:PDF

GTID:2518306323962479

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

For large-scale machine learning problems,stochastic gradient descent(SGD)is a fundamental tool.However,with the expansion of data and model scale,the training of machine learning models,especially deep learning models has become increasingly time-consuming.Therefore,it is very important to design distributed algorithms for accelerating training in machine learning research.Distributed stochastic gradient descent has been widely adopted in the training of machine learning models,which apply multiple workers in parallel.However,in prac-tice,the communication cost in model averaging prevents such distributed training from achieving a linear time speedup.Local-based algorithms,including Local SGD and Fe-dAvg,have gained much attention recently due to their superior properties,such as low communication cost and privacy-preserving.However,local-based algorithms would encounter a significant degradation in the convergence rate when the data distribution on workers is non-identical.In this paper,we improve its convergence rate under het-erogeneous data and further reduce the communication cost in distributed optimization.Previous studies prove that the communication complexity of Local SGD is in the order of(?)(T1/2)and(?)(T3/4)when the data distributions on workers are identical or non-identical.To prevent local-based algorithms from slow convergence rate in the heteroge-neous data,we propose Variance Reduced Local SGD(VRL-SGD),which is more robust under high data variance among workers.We theoretically proved that VRL-SGD can achieve a linear ite ration speedup with lower communication complexity(?)(T1/2)com-pared to(?)(T3/4)in Local SGD.For some practical scenarios with high data variance,we also present an effective warm-up mechanism,VRL-SGD-W,to eliminate the impact of the high data variance among workers.The experimental results demonstrate that VRL-SGD performs impressively better than Local SGD for the heterogeneous data and VRL-SGD-W is much robust under high data.The algorithms based on Local SGD mostly adopt the strategy of the fixed period and step size.However,a period or step size that changes adaptively with the itera-tion process often has better performance and achieves the optimal solution faster.In order to further reduce the communication complexity,based on VRL-SGD,this pa-per proposes STagewise Variance Reduced Local SGD(ST-VRL-SGD).ST-VRL-SGD increases the communication period and decreases the step size adaptively,and it is proved that the communication complexity is reduced to(?)(log(T))for strongly convex function whether the data distribution is identical or not.The experimental results show the superiority of ST-VRL-SGD algorithm for strongly convex function.

Keywords/Search Tags:

Distributed Optimization, Machine Learning, Federated Learning, Vari-ance Reduction, Communication Complexity

PDF Full Text Request

Related items

1	Optimization Method For Federated Learning Model With Unbalanced Dataset
2	Optimization And Application Of Large Scale Machine Learning System
3	Research On Recommendation System Technology Based On Vertical Federated Learning
4	Model Update Optimization For Heterogeneous Clients In Federated Learning
5	Research On Client Server Optimization Method Of Federated Learning
6	Communication Efficient Distributed Parallel Stochastic Optimization Algorithms
7	Research On Optimization Algorithms In Image Classification Based On Federated Learning
8	Research On Distributed Denial Of Service Attack Identification Based On Federated Learning
9	Research On Efficient Federated Learning Algorithm Based On Synchronous And Asynchronous
10	Research On Distributed Stochastic Variational Inference Algorithms For Big Data