Deep Nets Training Via Distributed Approximate Newton-Type Method With Adam-Based Local Optimization

Posted on:2021-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Bi

Full Text:PDF

GTID:2428330647952383

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

Distributed learning is a promising tool for alleviating the pressure of ever increasing data and/or model scale in modern machine learning systems.The DANE algorithm is an approximate Newton method which popularly used for communication-efficient distributed machine learning.Compared with the traditional methods,DANE has the advantage of exhibiting sharp convergence behavior and no need to calculate the inverse of the Hessian matrix,which can significantly reduce communication and computational costs in high dimensional settings.In order to further improve the computational efficiency,this thesis studies the problem of accelerating the local optimization of DANE.We choose to use Adam,which is one of the most popular adaptive gradient optimization algorithms,to replace the stochastic gradient descent method conventionally used by DANE for solving local sub-problems.Moreover,we add random sampling steps during the iteration to reduce the computational cost of each iteration and simulate the multi-machine computation.In the experiment,we set three different local sample sizes for comparison.The experimental results show that as long as the local sample size is set appropriate,the proposed Adam-based optimization can be obviously faster than the original SGD-based implementation but will slightly sacrifice model generalization performance.However,the experimental results also show that the using of Adam brings a certain decrease in generalization performance.In order to solve the problem of insufficient generalization performance caused by using the Adam method,this thesis introduces a mixed strategy SWATS that can adaptively switches from Adam to SGD.Experiments show that this strategy can retain the advantages of Adam's method in the initial training period and improve the accuracy of training results.In this thesis,the optimized algorithm is applied to distributed training through the MXNet platform.The experimental results show that with the increase of the number of parallel machines,the speed of training increases significantly and the proposed Adam-based optimization can be significantly faster than the original SGD-based implementation in convergence speed at almost no sacrifice in model generalization performance.

Keywords/Search Tags:

Deep learning, approximate Newton method, distributed optimization, Adam algorithm, random sampling

PDF Full Text Request

Related items

1	Research And Improvement Of Optimization Algorithms In Deep Learning
2	Hardware Implementation Of Quasi-newton Neural Network Training Algorithm Based On Approximate Computation
3	Sampling-based randomization techniques for approximate query processing
4	Research On Analog Signal Sampling And Reconstruction Based On Newton Interpolation Method
5	Research On Image Super-resolution Reconstruction Algorithm Based On Deep Learning
6	Optimization Design Of Intelligent Vehicle Obstacle Avoidance Based On Deep Learning
7	Research On Solving Method Of Multidimensional Knapsack Problem Based On Random Sampling Preprocessing
8	Application And Research Of Adaptive Optimization Algorithm In Deep Learning
9	Research On Training And Optimization Methods Of Deep Neural Networks
10	The Design And Analysis Of Asynchronous Gossip Algorithm For Stochastic Optimization With Approximate Projections