Convergence Analysis Of SGD And Ada Grad Algorithms In Nonconvex Settings

Posted on:2024-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:T R Xing

Full Text:PDF

GTID:2568307112489674

Subject:Computational Mathematics

Abstract/Summary:

Gradient-like optimization algorithm is a simple and efficient method for solving large-scale optimization problems,which has been widely used in machine learning.These algo-rithms minimize a loss function by iteratively adjusting model parameters so that the model can better fit the training data.Among them,stochastic gradient descent（SGD）is a very com-mon optimization algorithm in neural network model training.Its main idea is to use part of the training samples to calculate the gradient in each iteration,thereby reducing the calculation cost and speeding up the convergence speed.When analyzing the convergence of SGD,it is usually assumed that the stochastic gradient is an unbiased estimate of the batch gradient of the objective function,because using partial sample gradients instead of batch gradients will introduce noise,which may affect the performance of the algorithm.However,in large-scale optimization problems,this assumption is difficult to satisfy.In order to solve the above prob-lems,this paper analyzes the convergence of the non-convex function F under the assumption that the stochastic gradient is a biased estimate of the batch gradient of the objective function,and finally obtains a gradual convergence result.However,the algorithm needs to manually adjust the step size to achieve the best training effect,and may fall into a local optimal solution,especially in the later stage of training.The adaptive gradient descent method can adaptively adjust the step size according to his-torical information,reducing the cost,so it can converge to the optimal solution faster.Such algorithms are widely used in computer vision and pattern recognition tasks.However,so far,we still lack a theoretical analysis of the convergence of such algorithms when the objective function is a non-convex function.Therefore,this article investigates the Ada Grad algorithm based on batch gradients and the Ada Grad algorithm based on stochastic gradient for non-convex objective function,respectively.For the Ada Grad algorithm with batch gradient,we studied the global step size and coordinate-wise step size respectively,and provided asymptotic convergence results under the assumption that the objective function is L-smooth.For the Ada-Grad algorithm with stochastic gradient,we also studied the global step size and coordinate-wise step size separately.Under the assumption that the objective function is L-smooth,the stochastic gradient is an unbiased estimate of the batch gradient of the objective function,and the second order moment of the stochastic gradient is uniformly bounded,we obtain the rate of non-asymptotic convergence of O(lnT/T^1/2).

Keywords/Search Tags:

Nonconvex optimization, SGD, Ada Grad, Convergence, Machine learning

Related items

1	Research On Large-scale Regularized Machine Learning Algorithms
2	Nonconvex And Stochastic Algorithms In Machine Learning Research
3	Nonconvex And Nonsmooth Optimization Problems In Image Processing And Machine Learning:Algorithms And Applications
4	Global Convergence Analysis Of Nonconvex Sparse Regularization Problems
5	On Nonsmooth Nonconvex Optimization Problems With Lagrange Neural Network
6	Neural Network Algorithm For Nonconvex Optimization Problems And Its Applications
7	Research On The Problem Of Nonconvex Convolutional Sparse Coding Based On Slices
8	Research Of Nonconvex Convolutional Dictionary Learning Problem Based On Proximal Gradient Algorithm
9	Research On Algorithms For Solving Minimax Problems In Machine Learning
10	Image Reconstruction Research Based On Nonconvex Local Convolutional Sparse Coding