Deep learning is a highly non-linear model composed of multi-layer neural networks,which can show very strong expressive power on large-scale data sets.Adaptive algorithms such as Ada Grad,RMSProp,Adam,etc.have shown very fast training speeds on multiple tasks by automatically adjusting the learning rate of each parameter,and have become the mainstream optimization methods of current deep learning.However,the learning rate of adaptive algorithms such as Adam is unstable,which results in that the generalization on the test set is often inferior to the stochastic gradient descent method(SGD),and even fails to converge in some cases.Recently,some new algorithms such as AMSGrad have been proposed to solve this problem,but these new algorithms have not achieved a satisfactory improvement over the existing methods.Aiming at the problem of unstable learning rate,this paper proposes to use adaptive friction coefficients to suppress the oscillation of the learning rate,and thus obtained improved versions of Adam and AMSGrad,called TAdam and TAMSGrad,respectively.Taking the TAdam algorithm as an example,this article first theoretically uses an online learning framework to prove that TAdam can achieve the same convergence speed as Adam when the objective function is convex.Furthermore,we also prove that TAdam can achieve the same convergence effect as SGD in the case of stochastic non-convex optimization.Finally,in this paper,on the two major tasks of natural language processing and computer vision,the adaptive algorithm proposed in this paper is compared with other commonly used optimization algorithms.The experimental results show that TAdam and TAMSGrad effectively alleviate the oscillation problem of the learning rate,not only maintain the convergence speed as fast as the adaptive algorithm in training,but also achieve or even exceed the generalization performance of SGD on the test set,especially in complex The improvement on the deep neural network is more obvious. |