Deep learning is based on deep neural network.The most important factors that determine the performance of the network are data,network model and optimization algorithm.The optimization algorithm directly determines the value of all the parameters in the network model of millions.Therefore,the optimization algorithm of neural network is very important,and it is one of the significant topics in the field of deep learning.Currently,the optimization algorithm in the field of deep learning is mainly based on gradient descent method.Adam algorithm is widely used because of its fast convergence speed and high accuracy.However,some studies have pointed out that it has convergence problem in recent years,and in some experiments such as image processing,it is found that the effect is not as good as SGD with fine-tuned learning rate.In this paper,the convergence problem of Adam algorithm is analyzed by examples and theoretical derivation.It is found that Adam is non-zero mean regret asymptotic in some optimization problems due to the existence of exponential moving average variable,which will lead to bad convergence effect.In this paper,two corresponding improvement directions are proposed,namely,CAda and RCAdam algorithm.The first improvement is to reduce the learning rate of Adagrad monotonically.The lower bound clipping function is introduced to restrict the learning rate,and CAda algorithm is proposed.The second improvement method is based on Adam itself to alleviate the adverse impact of the moving average variable.Through analysis,it is found that the direct impact of the moving average variable on the learning rate adjustment of Adam algorithm is the extreme learning rate and variance problem.By introducing the upper and lower bound threshold function and the correction term to make up for the variance problem,the RCAdam algorithm is proposed.Based on CIFAR10 data set to complete the task of image classification,the most popular Adam and SGD algorithms are compared with CAda algorithm and RCAdam algorithm respectively.In order to ensure the generalization of the algorithm and have good performance in multiple networks,this paper selects two kinds of convolution networks,Res Net and Densenet,which are deeper and more complex,to carry out relevant comparative experiments.The experimental results show that in Res Net and Dense Net networks,compared with the most popular Adam and SGD algorithms,CAda and RCAdam have faster convergence speed and higher accuracy.The highest accuracy of test set obtained by CAda is 91.8% and 92.6% respectively,and the highest accuracy of test set obtained by RCAdam is 92.6% and 92% respectively,which are higher than that obtained results by SGD(91.5% and 90.7%)and Adam(90.4% and 90.5%),which achieved better performance. |