Font Size: a A A

Stochastic Gradient Method Based On First-Order Gradient Information In Deep Learnin

Posted on:2023-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:M JiFull Text:PDF
GTID:2568306785962159Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Theoretically,the optimization problems of deep neural network can be regarded as solving nonlinear optimization problems in high dimensional space.Therefore,optimization methods are extremely important in deep learning.In the learning methods based on stochastic gradient descent,the modification of gradient estimation and the adjustment of learning rate play an important role in improving the convergence speed and stability of the algorithms.Based on the stochastic gradient methods of first-order information,this paper studies the first-order algorithms and second-order quasi-Newton algorithms to reduce the variance of random gradient by modifying gradient estimation and adjusting learning rate.The main contents are as follows:For the problem of loss function value oscillation caused by high variance of stochastic gradient methods,this paper adopts the strategies of both weighted average and exponential attenuation for gradient estimation based on stochastic variance-reduced gradient methods to reduce the variance of stochastic gradient;Aiming at the problem of learning rate adjustment,by making full use of the historical gradient information to automatically adjust the learning rate.Finally,a new variance-reduced method is developed which is referred to as adaptive stochastic variance-reduced gradient method.Based on MNIST and CIFAR-10 data sets to verify the effectiveness of the method proposed in this paper.Experimental results show that the new method outperforms the stochastic variance-reduced gradient method and the stochastic gradient descent method in terms of convergence speed and stability.In order to solve the problem of the huge computational cost of the high-order optimization methods in deep learning,the L-BFGS method based on the approximate Hessian matrix of the first-order information is presented in this paper.By changing the way of choosing training sets,the Multi-batch L-BFGS method is introduced to improve the stability of the convergence process.Besides,the influence of learning rate decay strategy on the convergence speed and stability of multi-batch L-BFGS method is analyzed in detail,and two kinds of fixed learning rate decay strategies are given.In the numerical examples,the effectiveness of the Multi-batch L-BFGS method is verified based on MNIST and CIFAR-10 data sets.Compared with the first-order optimization algorithm,the multi-batch L-BFGS method performs better in terms of convergence speed and stability.
Keywords/Search Tags:Deep learning, Stochastic gradient descent method, Variance attenuation, Learning rate, L-BFGS method
PDF Full Text Request
Related items