Application And Research Of Adaptive Optimization Algorithm In Deep Learning

Posted on:2021-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:C Tao

Full Text:PDF

GTID:2518306107459444

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Deep learning is a highly non-linear model composed of multi-layer neural networks,which can show very strong expressive power on large-scale data sets.Adaptive algorithms such as Ada Grad,RMSProp,Adam,etc.have shown very fast training speeds on multiple tasks by automatically adjusting the learning rate of each parameter,and have become the mainstream optimization methods of current deep learning.However,the learning rate of adaptive algorithms such as Adam is unstable,which results in that the generalization on the test set is often inferior to the stochastic gradient descent method(SGD),and even fails to converge in some cases.Recently,some new algorithms such as AMSGrad have been proposed to solve this problem,but these new algorithms have not achieved a satisfactory improvement over the existing methods.Aiming at the problem of unstable learning rate,this paper proposes to use adaptive friction coefficients to suppress the oscillation of the learning rate,and thus obtained improved versions of Adam and AMSGrad,called TAdam and TAMSGrad,respectively.Taking the TAdam algorithm as an example,this article first theoretically uses an online learning framework to prove that TAdam can achieve the same convergence speed as Adam when the objective function is convex.Furthermore,we also prove that TAdam can achieve the same convergence effect as SGD in the case of stochastic non-convex optimization.Finally,in this paper,on the two major tasks of natural language processing and computer vision,the adaptive algorithm proposed in this paper is compared with other commonly used optimization algorithms.The experimental results show that TAdam and TAMSGrad effectively alleviate the oscillation problem of the learning rate,not only maintain the convergence speed as fast as the adaptive algorithm in training,but also achieve or even exceed the generalization performance of SGD on the test set,especially in complex The improvement on the deep neural network is more obvious.

Keywords/Search Tags:

Stochastic gradient descent, Adaptive algorithm Gradient, Learning rate, Adam, AMSGrad

PDF Full Text Request

Related items

1	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent
2	Improvement Of Adaptive Gradient Descent Method Based On Neural Network
3	A Research Of Stochastic Gradient Descent Algorithm
4	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
5	A Research And Application On Stochastic Gradient Descent Algorithm In Distributed Cluster
6	Research On Distributed Stochastic Gradient Descent Algorithm
7	Imbalanced Stochastic Gradient Descent Online Algorithm For SVM
8	A Ranking Algorithm ListNet Based On Stochastic Gradient Descent
9	Applied Research On Gradient Descent Algorithm In Deep Learning
10	Dynamic Regret Of Online Gradient Descent:Analyses And Applications