Font Size: a A A

AdamX Optimizer:A New Optimizer Based On Gradient And Momentum Coodination Of Learning Rate

Posted on:2023-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2568306902998649Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
Parameter optimization is a very important part of deep learning.It trains the network parameters through the loss function of the model and minimizes the loss function so as to obtain the best parameters of the network.The parameters of deep learning network are the key to the model,which determines the accuracy of network fitting.Optimization algorithm is the core of parameter training,and a good optimization algorithm can help the network find the optimal parameters faster and more accurately,so as to improve the accuracy and efficiency of the model.First-order optimization algorithms can be divided into two categories:stochastic gradient descent and adaptive gradient optimization algorithm.Stochastic gradient descent is one of the earliest methods,which is widely used and still used in some fields.However,stochastic gradient descent methods keeps the learning rate fixed,so it is easy to fall into local optimum in the early stage of parameter training,which is inefficient.The adaptive gradient methods represented by Adam algorithm can achieve fast training of deep neural networks by scaling the learning rate of the second moment momentum of the gradient,and has achieved great success in machine learning.Although Adam algorithm has high efficiency in the early stage,its trained parameters perform poorly in the test set,and its generalization ability is poor compared with stochastic gradient descent method.In this paper,we propose a new optimization algorithm called AdamX algorithm.The algorithm can achieve fast training and improve the generalization ability greatly.The key to the success of the AdamX optimizer is the addition of a hyperparameter,which uses the gradient and the first-order momentum coordination of the gradient to control the learning step size and improve the generalization of the optimizer while ensuring the efficiency of the adaptive algorithm.In addition,this paper adopts three methods to further control the learning rate in the late iteration of the optimizer,so as to make the late parameter training more stable.On the basis of online learning framework,this paper proves the convergence of AdamX algorithm for parameter iteration when the loss function is convex.According to the analysis,it can be found that the regret value between a series of parameter iteration points obtained by AdamX algorithm and the optimal parameter iteration points is O((?)).It can be considered that the training error of AdamX algorithm on parameters is upper bounded when the loss function is convex.In this paper,AdamX optimizer is applied to the fields of character image recognition,language processing and finance.Two classical deep neural networks,residual neural network and LSTM,are used to test the performance of AdamX optimizer and compare with several recently proposed adaptive optimizers.According to the experimental data and results,AdamX optimizer can improve the accuracy of the model on the test set while ensuring the fast training of the early parameters.In the face of different data sets and deep networks,AdamX optimizer always shows advanced performance and excellent stability.Compared with other adaptive improved algorithms such as Adabelief and AdaMomentum,AdamX optimizer has more improvement in generalization and higher accuracy on the test set.It can be considered that AdamX,as an improvement of the adaptive algorithm,is an effective and excellent optimizer that can be applied to various tasks and deep learning models.
Keywords/Search Tags:Deep Learning, Image Recognition, Adaptive Optimization Algorithm, Learning Rate Control
PDF Full Text Request
Related items