Font Size: a A A

Research On Escaping Saddle Point Algorithm Based On Zero-order Optimizatio

Posted on:2024-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhangFull Text:PDF
GTID:2568307106481994Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous emergence of modern machine learning and deep learning tasks,the optimization of many models can be regarded as optimization problems with non-convex loss functions.Therefore,non-convex optimization has received extensive attention in both academia and industry,and has become a popular research direction.The traditional first-order optimization algorithm represented by gradient descent has been proved to converge to the firstorder stationary point in polynomial time.However,in non-convex functions,the first-order stationary point may not only be a local minimum,but also a saddle point.Especially in highdimensional non-convex optimization problems,the ubiquity of saddle points raises the problem of highly suboptimal solutions.Therefore,how to effectively escape from the saddle point has become an important research topic in the field of non-convex optimization in recent years.Except for the saddle point problem,many optimization problems cannot obtain the internal information of the model to directly calculate the gradient.These problems are known as black-box optimization problems and usually require optimization using sampling-based methods.To solve these problems,zeroth-order optimization methods are proposed.Zerothorder optimization methods optimize by computing the model output,thus avoiding the computation of gradients.This approach can be applied not only to black-box optimization problems,but also to gradient computation problems with high computational cost.At the same time,algorithms based on zeroth-order optimization methods have also been widely studied and applied.In this paper,the zeroth-order optimization algorithms for escaping from the saddle points in non-convex optimization are studied as follows:(1)Aiming at the problem of escaping from saddle point in online stochastic optimization and offline non-stochastic optimization scenarios,this paper proposes zeroth-order online negative curvature search and zeroth-order non-stochastic negative curvature search algorithms,respectively.For a given point on a smooth,Hessian Lipschitz non-convex function,the zeroorder negative curvature finding algorithm aims to use the zeroth-order information of the model to detect whether there is a negative curvature direction at the point,and then determine whether the point is a local minimum or saddle point.This paper applies the proposed zerothorder negative curvature finding algorithms to four zeroth-order algorithms and prove that these zeroth-order algorithms can converge to second-order stationary points,which are zeroth-order gradient descent,zeroth-order stochastic gradient descent,zeroth-order stochastically controlled stochastic gradient,and zeroth-order stochastic path-integrated differential estimator.Theoretical proof and numerical experimental verification show that the proposed algorithms have lower function query complexity in converging to the second-order stationary point.(2)In non-stochastic optimization scenario,this paper further studies two Nesterov’s accelerated gradient descent based zeroth-order algorithms for converging to second-order stationary points.This paper first study a zeroth-order version of the perturbed accelerated gradient descent method using the central finite difference version of the coordinate-wise gradient estimator.Theoretically,it is proved that the proposed algorithm can converge to an approximate second-order stationary point with lower function query complexity.Due to the efficiency of the negative curvature finding for finding the most negative curvature direction near a saddle point,this paper further studies a zeroth-order version of the perturbed accelerated gradient descent with accelerated negative curvature finding subroutine,which uses the finite difference of two coordinate-wise gradient estimators to approximate the Hessian-vector product.Theoretically,it is proved that this algorithm improves the function query complexity compared with the previous algorithm by a factor of polynomial log.Finally,this paper verifies the efficiency of the proposed algorithms in escaping from the saddle point through numerical experiments.
Keywords/Search Tags:non-convex optimization, zeroth-order optimization, escaping saddle points, negative curvature finding
PDF Full Text Request
Related items