Font Size: a A A

On The Convergence Of First-Order Methods In Machine Learning

Posted on:2019-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y ChenFull Text:PDF
GTID:1368330572969073Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
First-order optimization methods have been widely used in machine learning,due to the benefits including but not limited to weak assumptions on objective function,fast convergence rate and easy implementation in practice.However,traditional first-order methods have many implementation issues.Since the data size and problem dimen-sions grow dramatically in recent years,deterministic first-order methods are no longer capable of handling optimization tasks in machine learning.On the other hand,the con-vergence of most first-order methods are based on worse case analysis,which rarely happen in real-world applications.Thus adaptive methods,i.e.ADAGRAD,usually per-form better in machine learning tasks by adaptively choosing learning rate according to learning history.However,most of aforementioned methods do not have theoretical guarantees to converge in non-convex setting,which includes a lot of learning tasks such as deep learning.Motivated by these facts,we consider to develop more efficient and applicable first-order optimization algorithms with solid analysis in this paper.The contributions of this paper are four folds.1)We analysis rank minimization problem by utilizing Kurdyka-Lojasiewicz(KL)property and demonstrate that the polynomial penalties on singular values have KL property.Thus a proximal type algo-rithm is proposed with asymptotic convergence rate O(1/?)when the gradient of smooth component of objective is Lipschitz continuous.2)A strongly adaptive stochastic sub-gradient method,namely SADAGRAD,is proposed under strong convexity and local er-ror bound conditions,which has adaptive convergence rate theoretically.In worst case analysis,SADAGRAD could achieve O(1/?)and O(1/?2(1-?)iteration complexities un-der strong convexity and local error bound conditions respectively,where a E[0,1)is a parameter in error bound condition.More importantly,the convergence rate and de-pendence on the dimensionality can benefit from sparse stochastic gradient,which will make SADAGAD converge faster practically.3)For non-convex problems,a univer-sal stagewise learning framework is proposed in this paper.Many off-the-shelf convex optimization methods,including adaptive methods,can be used to solve non-convex problems under this framework,which enrich choices in non-convex setting.4)At last,we proposed a stagewise accelerated variance reduced gradient method,namely Stagewise-Katyusha,in non-convex setting.When ?-weakly convex objective func-tion is the summation of n L-smooth components together with some relatively simple terms and L/??4n/3,stagewise-Katyusha could achieve the lower bound computational complexity and require lower memory cost compared with SAGA-style state-of-arts.
Keywords/Search Tags:Convex Optimization, Error Bound, First-Order Method, KL Inequality, Non-convex Optimization, Stochastic First-Order Method, Variance Reduction
PDF Full Text Request
Related items