On The Convergence Of First-Order Methods In Machine Learning

Posted on:2019-04-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Y Chen

Full Text:PDF

GTID:1368330572969073

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

First-order optimization methods have been widely used in machine learning,due to the benefits including but not limited to weak assumptions on objective function,fast convergence rate and easy implementation in practice.However,traditional first-order methods have many implementation issues.Since the data size and problem dimen-sions grow dramatically in recent years,deterministic first-order methods are no longer capable of handling optimization tasks in machine learning.On the other hand,the con-vergence of most first-order methods are based on worse case analysis,which rarely happen in real-world applications.Thus adaptive methods,i.e.ADAGRAD,usually per-form better in machine learning tasks by adaptively choosing learning rate according to learning history.However,most of aforementioned methods do not have theoretical guarantees to converge in non-convex setting,which includes a lot of learning tasks such as deep learning.Motivated by these facts,we consider to develop more efficient and applicable first-order optimization algorithms with solid analysis in this paper.The contributions of this paper are four folds.1)We analysis rank minimization problem by utilizing Kurdyka-Lojasiewicz(KL)property and demonstrate that the polynomial penalties on singular values have KL property.Thus a proximal type algo-rithm is proposed with asymptotic convergence rate O(1/?)when the gradient of smooth component of objective is Lipschitz continuous.2)A strongly adaptive stochastic sub-gradient method,namely SADAGRAD,is proposed under strong convexity and local er-ror bound conditions,which has adaptive convergence rate theoretically.In worst case analysis,SADAGRAD could achieve O(1/?)and O(1/?2(1-?)iteration complexities un-der strong convexity and local error bound conditions respectively,where a E[0,1)is a parameter in error bound condition.More importantly,the convergence rate and de-pendence on the dimensionality can benefit from sparse stochastic gradient,which will make SADAGAD converge faster practically.3)For non-convex problems,a univer-sal stagewise learning framework is proposed in this paper.Many off-the-shelf convex optimization methods,including adaptive methods,can be used to solve non-convex problems under this framework,which enrich choices in non-convex setting.4)At last,we proposed a stagewise accelerated variance reduced gradient method,namely Stagewise-Katyusha,in non-convex setting.When ?-weakly convex objective func-tion is the summation of n L-smooth components together with some relatively simple terms and L/??4n/3,stagewise-Katyusha could achieve the lower bound computational complexity and require lower memory cost compared with SAGA-style state-of-arts.

Keywords/Search Tags:

Convex Optimization, Error Bound, First-Order Method, KL Inequality, Non-convex Optimization, Stochastic First-Order Method, Variance Reduction

PDF Full Text Request

Related items

1	First-order Optimization Method For Principal Component Pursuit
2	Nonlinear Cone-constrained Optimization And Nonconvex Nonsmooth Optimization: Methods And Applications In Management
3	Adaptive Step Size For First Order Stochastic Optimization
4	Analysis Of Compression Sensing Convex Optimization Method
5	The Sensor Selection Based On PCRLB And Convex Optimization
6	Research On Multi-station TDOA/FDOA Localization Methods Via Convex Optimization
7	Research And Application Of Convex Optimization Algorithms Based On Compressed Sensing
8	Research On Sparse Algorithms And Non-convex Optimization Problems In Machine Learning
9	Research And Improvement Of Model Order Reduction Method In The Field Of Integrated Circuit Analysis
10	A Zero-Order Improved Algorithm Based On Push-Sum Algorithm For Distributed Convex Optimization