The deep neural network is the core of deep learning field,and the back-propagation algorithm based on the gradient descent algorithm is the cornerstone of neural network system.At present,the deep neural network mainly uses the gradient descent algorithm to find a set of optimal parameters.Although the deep neural network has very strong capabilities,it is very difficult to be optimized.Firstly,the loss function of the neural network is non-convex,and it is hard to find the global optimal point.Secondly,the parameters of deep neural networks and training datasets are usually very large,which makes the computationally expensive second-order optimization algorithms unusable,and the training efficiency of first-order optimization algorithms is relatively low.Therefore,suitable optimization algorithms need to be selected and used in the training process of deep neural networks.This paper conducts related research on the optimization problems and training methods of gradient descent method that are used in the training process of deep neural networks.We propose the gradient descent optimization method in the training of deep learning models based on multi-stage.On the basis of this,the gradient descent optimization method in the training of deep learning models based on multi-stage and method combination is proposed by incorporating the idea of method combination.Finally,the gradient descent optimization method is proposed based on the cyclic decrease of batch size in deep neural network training.The achievements of this paper are illustrated as follows:Firstly,three influencing factors of gradient descent method in deep neural network training are analyzed: learning rate,gradient estimation and batch size.Based on the analysis from two perspectives of learning rate and gradient estimation,adaptive optimization algorithms can be seen as improved methods,combining learning rate adjustment and gradient estimation.Hence,we propose the gradient descent optimization method based on the combination idea.Secondly,inspired by three methods,Warmup,CLR and SGDR,the idea of multi-stage is integrated with the idea of combination and applied to gradient optimization.And the gradient descent optimization method in deep learning model training based on multi-stage and method combination is proposed.Through a large number of experiments,it is shown that the proposed method has a good effect and can improve the training effect of deep neural network models.Finally,the effects of batch size on neural network training are analyzed from four perspectives: statistics,training time,gradient change,and generalization ability.The optimization effect brought by decreasing batch size is analyzed from the change of gradient when updating parameters,model robustness,and training noise,considering realistic hardware limitations.Meanwhile,the mechanism underlying the wide application of the cyclic idea in the field of deep learning is analyzed.With the idea of decreasing batch size and cycling,we propose a deep neural network training method based on decreasing batch size cyclically,and the effectiveness of the proposed method is verified by extensive experiments. |