Font Size: a A A

Variance Reducing Stochastic Method And Its Applications In Large-Scale Machine Learning

Posted on:2020-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:J C LiuFull Text:PDF
GTID:2428330572974161Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Optimization is a key step in machine learning.Almost all machine learning prob-lems involve optimization.As a result,designing a fast and efficient optimization algo-rithm is very important in machine learning research.We consider two different types of optimization problems and develop corresponding algorithms.Firstly,we consider the fundamental problem in machine learning,that is regular-ized empirical risk minimization.In our manuscript,regularized empirical risk func-tion is represented as the average of a large number of convex loss functions plus a possibly non-smooth convex regularization term.To solve the problem faster and more efficient,we propose an accelerated variance reducing(VR)stochastic method called Prox2-SAGA.Different from traditional VR stochastic methods,Prox2-SAGA replaces the stochastic gradient of the loss function with the corresponding gradient mapping.In addition,Prox2-SAGA computes the proximal operator of the regularization term.These two calculations constitute a Douglas-Rachford splitting step.For strongly con-vex and smooth loss functions,we prove that Prox2-SAGA can achieve a linear con-vergence rate comparable to the fastest accelerated VR stochastic methods.In addition,Prox2-SAGA is more practical as it involves only the stepsize to tune.When each loss function is smooth but non-strongly convex,we prove a convergence rate of O(1/k)for the proposed Prox2-SAGA method,where k is the number of iterations.Moreover,experiments show that Prox2-SAGA is valid for non-smooth loss functions,and for strongly convex and smooth loss functions,Prox2-SAGA is prominently faster when loss functions are ill-conditioned.Often,regularisers in regularized empirical risk function should be in the compos-ite form to introduce important structural information regarding the problem or data.We will consider optimization problems with such regulariser in the second part of this paper.Traditionally,a proximal operator is used to handle the nonsmoothness of the regulariser.But in general,it is challenging to calculate the proximal operator with the composite regularisers.Recently,proximal average(PA)which involves a feasible proximal operator calculation is proposed to approximate composite regularisers.Aug-mented with the prevailing variance reducing(VR)stochastic methods(e.g.SVRG,SAGA),PA based algorithms would achieve a better performance.However,existing works require a fixed stepsize,which needs to be rather small to ensure that the PA ap-proximation is sufficiently accurate.In the meantime,the smaller stepsize would incur many more iterations for convergence.In this paper,we propose two fast PA based VR stochastic methods-APA-SVRG and APA-SAGA.By initializing the stepsize with a much larger value and adaptively decreasing it,both of the proposed methods are proved to enjoy the O(n log1/?+m01/?1)iteration complexity to achieve the ?-accurate solutions,where mo is the initial number of inner iterations and n is the number of samples.More-over,experimental results demonstrate the superiority of the proposed algorithms.
Keywords/Search Tags:Variance reduction(VR), Acceleration, Douglas-Rachford splitting, Proximal operator, Proximal average, Adaptive stepsize
PDF Full Text Request
Related items