Research On Large-scale Regularized Machine Learning Algorithms

Posted on:2018-06-06

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L B Qiao

Full Text:PDF

GTID:1368330623450390

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of high dimensional data in the academic and industrial community,the traditional machine learning algorithms are facing new challenges from statistical models to optimization algorithms,and machine learning algorithms with composite regularization gain a lot of attention due to its generalization capacity,interpretability,and potential computational advantages.Compositely regularized statistical models explicitly encode the structural information among variables by adding specific regularized terms.With the strong motivation to improve model's expressive ability,people proposed various regularized functions.These regularized functions have greatly improved the performance of traditional machine learning algorithms by utilizing specific structural information.However,in many real-world applications,it is computationally difficult to solve the compositely regularized minimization problems at large-scale.Specifically,challenges lie in two-folds:one is the large size of the dataset samples,and the other is the high dimension of the model variables.Whatever,in real-world applications,the objective functions could be computationally difficult,due to the non-linearity,non-convex,non-smooth,etc.,and the composition of the regularized terms.In this thesis,we focus on solving these kinds of problems and propose several online and distributed algorithms with convergence analysis to solve large-scale machine learning problems.Proposed algorithms are summarized as follows:1.We consider a wide spectrum of regularized stochastic minimization problems,where the regularization term is composite with a linear function.Examples of this formulation include graph-guided regularized minimization,generalized Lasso and a class of l1 regularized problems.The computational challenge is that the closed-form solution of the proximal mapping associated with the regularization term is not available due to the imposed linear composition.Fortunately,the structure of the regularization term allows us to reformulate it as a new convex-concave saddle point problem which can be solved using the Primal-Dual Hybrid Gradient(PDHG)approach.However,this approach may be inefficient in realistic applications as computing the full gradient of the expected objective function could be very expensive when the number of input data samples is considerably large.To address this issue,we propose a Stochastic Primal-Dual Hybrid Gradient(SPDHG)algorithm with either uniformly or non-uniformly averaged iterates.Through uniformly averaged iterates,the SPDHG algorithm converges in expectation with O(1/(?)t)rate for general convex objectives and O(log(t)/t)rate for strongly convex objectives,respectively.While with non-uniformly averaged iterates,the SPDHG algorithm is expected to converge with O(1/t)rate for strongly convex objectives.Numerical experiments on different genres of datasets demonstrate that our proposed algorithm outperforms other competing algorithms.2.We propose a Stochastic ExtraGradient-Based Alternating Direction Method(SEGADM)to minimize the sum of two possibly nonsmooth convex objective functions subject to linear linking constraints.One of these two functions is the expectation of a stochastic composite function,and the other has a relatively easy proximal map.The SEGADM optimization model abstracts a number of important applications in machine learning,such as fused Lasso,fused logistic regression,and graph-guided regularized minimization.We study the convergence property of the SEGADM algorithm with either uniformly or nonuniformly averaged iterates.For the uniformly averaged iterates,the SEGADM algorithm converges in expectation with O(1/(?)t)rate for general convex objectives,and O(log(t)/t)rate for strongly convex objectives,respectively.While for the non-uniformly averaged iterates,the SEGADM algorithm is expected to converge with O(1/t)rate for strongly convex objectives.Experimental results on different genres of datasets demonstrate that the proposed algorithm outperforms other competing algorithms.3.We consider a wide range of non-convex regularized minimization problems,where the non-convex regularization term is composite with a linear function engaged in sparse learning.Recent theoretical investigations have demonstrated their superiority over their convex counterparts.The computational challenge lies in the fact that the proximal mapping associated with non-convex regularization is not easily obtained due to the imposed linear composition.Fortunately,the problem structure allows one to introduce an auxiliary variable and reformulate it as an optimization problem with linear constraints,which can be solved using the Linearized Alternating Direction Method of Multipliers(LADMM).Despite the success of LADMM in practice,it remains unknown whether LADMM is convergent in solving such non-convex compositely regularized optimizations.In this research,we first present a detailed convergence analysis of the LADMM algorithm for solving a non-convex compositely regularized optimization problem with a large class of non-convex penalties.Furthermore,we propose an Adaptive LADMM(AdaL-ADMM)algorithm with a line-search criterion.Experimental results on different genres of datasets validate the efficacy of the proposed algorithm.4.We focus on the problem of minimizing the sum of a smooth nonlinear(possibly nonconvex)function and a block-separable(possibly nonconvex nonsmooth)function involving a large number of block variables subject to inequality constraints.Recent theoretical analysis has demonstrated the superiority of the solution of this problem over those obtained from its convex counterparts in statistics.However,solving such a nonsmooth nonlinear optimization problem associated with inequality constraints remains a big challenge.To this end,we propose a new distributed first-order optimization method,noted as parallel primaldual coordinate method of multipliers,which alternates between minimizing over a random subset of the primal variables and a gradient type update of a dual variable.We further present the asymptotic convergence analysis of the proposed method for both cyclic and randomized block variable selection rules.Experiments conducted on constrained folded concave penalized linear regression validate the efficacy of our optimization method and the distributed variant of the proposed algorithm achieve linear speedup ratio.

Keywords/Search Tags:

large scale machine learning, massive dataset, high-dimensional model, compositely regularization, convex optimization, nonconvex optimization, nonsmooth, ADMM, PDHG, BCD, stochastic optimization, asynchronous parallel, convergence analysis

PDF Full Text Request

Related items

1	Nonlinear Cone-constrained Optimization And Nonconvex Nonsmooth Optimization: Methods And Applications In Management
2	Nonconvex And Nonsmooth Optimization Problems In Image Processing And Machine Learning:Algorithms And Applications
3	The Application Of Convex Optimization In Large Scale Machine Learning
4	Communication Efficient Distributed Parallel Stochastic Optimization Algorithms
5	Research On Theory And Algorithms In High-dimensional Non-convex Derivative-free Optimization
6	On Nonsmooth Nonconvex Optimization Problems With Lagrange Neural Network
7	Some Applications Of Sparse Optimization In Machine Learning
8	On The Convergence Of First-Order Methods In Machine Learning
9	On Network Optimization Technology For MXNet-based Large-scale Distributed Machine Learning
10	Global Convergence Analysis Of Nonconvex Sparse Regularization Problems