Research On Gradient Descent Optimization Algorithm In Deep Neural Network Training

Posted on:2023-07-25

Degree:Master

Type:Thesis

Country:China

Candidate:M D Yao

Full Text:PDF

GTID:2558307058963819

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The deep neural network is the core of deep learning field,and the back-propagation algorithm based on the gradient descent algorithm is the cornerstone of neural network system.At present,the deep neural network mainly uses the gradient descent algorithm to find a set of optimal parameters.Although the deep neural network has very strong capabilities,it is very difficult to be optimized.Firstly,the loss function of the neural network is non-convex,and it is hard to find the global optimal point.Secondly,the parameters of deep neural networks and training datasets are usually very large,which makes the computationally expensive second-order optimization algorithms unusable,and the training efficiency of first-order optimization algorithms is relatively low.Therefore,suitable optimization algorithms need to be selected and used in the training process of deep neural networks.This paper conducts related research on the optimization problems and training methods of gradient descent method that are used in the training process of deep neural networks.We propose the gradient descent optimization method in the training of deep learning models based on multi-stage.On the basis of this,the gradient descent optimization method in the training of deep learning models based on multi-stage and method combination is proposed by incorporating the idea of method combination.Finally,the gradient descent optimization method is proposed based on the cyclic decrease of batch size in deep neural network training.The achievements of this paper are illustrated as follows:Firstly,three influencing factors of gradient descent method in deep neural network training are analyzed: learning rate,gradient estimation and batch size.Based on the analysis from two perspectives of learning rate and gradient estimation,adaptive optimization algorithms can be seen as improved methods,combining learning rate adjustment and gradient estimation.Hence,we propose the gradient descent optimization method based on the combination idea.Secondly,inspired by three methods,Warmup,CLR and SGDR,the idea of multi-stage is integrated with the idea of combination and applied to gradient optimization.And the gradient descent optimization method in deep learning model training based on multi-stage and method combination is proposed.Through a large number of experiments,it is shown that the proposed method has a good effect and can improve the training effect of deep neural network models.Finally,the effects of batch size on neural network training are analyzed from four perspectives: statistics,training time,gradient change,and generalization ability.The optimization effect brought by decreasing batch size is analyzed from the change of gradient when updating parameters,model robustness,and training noise,considering realistic hardware limitations.Meanwhile,the mechanism underlying the wide application of the cyclic idea in the field of deep learning is analyzed.With the idea of decreasing batch size and cycling,we propose a deep neural network training method based on decreasing batch size cyclically,and the effectiveness of the proposed method is verified by extensive experiments.

Keywords/Search Tags:

Gradient Descent, Learning Rate, Batch Size, Cycle Strategy, Deep Neural Network Training

PDF Full Text Request

Related items

1	Improvement Of Adaptive Gradient Descent Method Based On Neural Network
2	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent
3	Neural Network Training Method Based On Time Fractional Gradient Descent Metho
4	Research On Distributed Stochastic Gradient Descent Algorithm
5	Research And Application Of Convolutional Neural Networks Based On CMA-ES Algorithm
6	A Research Of Stochastic Gradient Descent Algorithm
7	Applied Research On Gradient Descent Algorithm In Deep Learning
8	Optimal Design And Implementation Of Distributed Deep Learning Training
9	Stochastic Gradient Method Based On First-Order Gradient Information In Deep Learnin
10	Speaker Recognition Based On Fusion Features And Deep Neural Networks