Font Size: a A A

Deep Nets Training Via Distributed Approximate Newton-Type Method With Adam-Based Local Optimization

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:C Y BiFull Text:PDF
GTID:2428330647952383Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Distributed learning is a promising tool for alleviating the pressure of ever increasing data and/or model scale in modern machine learning systems.The DANE algorithm is an approximate Newton method which popularly used for communication-efficient distributed machine learning.Compared with the traditional methods,DANE has the advantage of exhibiting sharp convergence behavior and no need to calculate the inverse of the Hessian matrix,which can significantly reduce communication and computational costs in high dimensional settings.In order to further improve the computational efficiency,this thesis studies the problem of accelerating the local optimization of DANE.We choose to use Adam,which is one of the most popular adaptive gradient optimization algorithms,to replace the stochastic gradient descent method conventionally used by DANE for solving local sub-problems.Moreover,we add random sampling steps during the iteration to reduce the computational cost of each iteration and simulate the multi-machine computation.In the experiment,we set three different local sample sizes for comparison.The experimental results show that as long as the local sample size is set appropriate,the proposed Adam-based optimization can be obviously faster than the original SGD-based implementation but will slightly sacrifice model generalization performance.However,the experimental results also show that the using of Adam brings a certain decrease in generalization performance.In order to solve the problem of insufficient generalization performance caused by using the Adam method,this thesis introduces a mixed strategy SWATS that can adaptively switches from Adam to SGD.Experiments show that this strategy can retain the advantages of Adam's method in the initial training period and improve the accuracy of training results.In this thesis,the optimized algorithm is applied to distributed training through the MXNet platform.The experimental results show that with the increase of the number of parallel machines,the speed of training increases significantly and the proposed Adam-based optimization can be significantly faster than the original SGD-based implementation in convergence speed at almost no sacrifice in model generalization performance.
Keywords/Search Tags:Deep learning, approximate Newton method, distributed optimization, Adam algorithm, random sampling
PDF Full Text Request
Related items