Stochastic Gradient Method Based On First-Order Gradient Information In Deep Learnin

Posted on:2023-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:M Ji

Full Text:PDF

GTID:2568306785962159

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Theoretically,the optimization problems of deep neural network can be regarded as solving nonlinear optimization problems in high dimensional space.Therefore,optimization methods are extremely important in deep learning.In the learning methods based on stochastic gradient descent,the modification of gradient estimation and the adjustment of learning rate play an important role in improving the convergence speed and stability of the algorithms.Based on the stochastic gradient methods of first-order information,this paper studies the first-order algorithms and second-order quasi-Newton algorithms to reduce the variance of random gradient by modifying gradient estimation and adjusting learning rate.The main contents are as follows:For the problem of loss function value oscillation caused by high variance of stochastic gradient methods,this paper adopts the strategies of both weighted average and exponential attenuation for gradient estimation based on stochastic variance-reduced gradient methods to reduce the variance of stochastic gradient;Aiming at the problem of learning rate adjustment,by making full use of the historical gradient information to automatically adjust the learning rate.Finally,a new variance-reduced method is developed which is referred to as adaptive stochastic variance-reduced gradient method.Based on MNIST and CIFAR-10 data sets to verify the effectiveness of the method proposed in this paper.Experimental results show that the new method outperforms the stochastic variance-reduced gradient method and the stochastic gradient descent method in terms of convergence speed and stability.In order to solve the problem of the huge computational cost of the high-order optimization methods in deep learning,the L-BFGS method based on the approximate Hessian matrix of the first-order information is presented in this paper.By changing the way of choosing training sets,the Multi-batch L-BFGS method is introduced to improve the stability of the convergence process.Besides,the influence of learning rate decay strategy on the convergence speed and stability of multi-batch L-BFGS method is analyzed in detail,and two kinds of fixed learning rate decay strategies are given.In the numerical examples,the effectiveness of the Multi-batch L-BFGS method is verified based on MNIST and CIFAR-10 data sets.Compared with the first-order optimization algorithm,the multi-batch L-BFGS method performs better in terms of convergence speed and stability.

Keywords/Search Tags:

Deep learning, Stochastic gradient descent method, Variance attenuation, Learning rate, L-BFGS method

PDF Full Text Request

Related items

1	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent
2	A Research Of Stochastic Gradient Descent Algorithm
3	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
4	Applied Research On Gradient Descent Algorithm In Deep Learning
5	Research Of Medical Image Annotation Model Based On Deep Learning
6	Higher-order PageRank Problem And Scheduled Restart Momentum Stochastic Gradient Descent Method In Deep Learning
7	Research On Distributed Stochastic Gradient Descent Algorithm
8	An Optimization Method For Solving Large-scale Machine Learning Problems
9	Application And Research Of Adaptive Optimization Algorithm In Deep Learning
10	Research On Optimization And Application Technology Of Gradient Descent Algorithm In Deep Learning