Variance Reducing Stochastic Method And Its Applications In Large-Scale Machine Learning

Posted on:2020-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:J C Liu

Full Text:PDF

GTID:2428330572974161

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Optimization is a key step in machine learning.Almost all machine learning prob-lems involve optimization.As a result,designing a fast and efficient optimization algo-rithm is very important in machine learning research.We consider two different types of optimization problems and develop corresponding algorithms.Firstly,we consider the fundamental problem in machine learning,that is regular-ized empirical risk minimization.In our manuscript,regularized empirical risk func-tion is represented as the average of a large number of convex loss functions plus a possibly non-smooth convex regularization term.To solve the problem faster and more efficient,we propose an accelerated variance reducing(VR)stochastic method called Prox2-SAGA.Different from traditional VR stochastic methods,Prox2-SAGA replaces the stochastic gradient of the loss function with the corresponding gradient mapping.In addition,Prox2-SAGA computes the proximal operator of the regularization term.These two calculations constitute a Douglas-Rachford splitting step.For strongly con-vex and smooth loss functions,we prove that Prox2-SAGA can achieve a linear con-vergence rate comparable to the fastest accelerated VR stochastic methods.In addition,Prox2-SAGA is more practical as it involves only the stepsize to tune.When each loss function is smooth but non-strongly convex,we prove a convergence rate of O(1/k)for the proposed Prox2-SAGA method,where k is the number of iterations.Moreover,experiments show that Prox2-SAGA is valid for non-smooth loss functions,and for strongly convex and smooth loss functions,Prox2-SAGA is prominently faster when loss functions are ill-conditioned.Often,regularisers in regularized empirical risk function should be in the compos-ite form to introduce important structural information regarding the problem or data.We will consider optimization problems with such regulariser in the second part of this paper.Traditionally,a proximal operator is used to handle the nonsmoothness of the regulariser.But in general,it is challenging to calculate the proximal operator with the composite regularisers.Recently,proximal average(PA)which involves a feasible proximal operator calculation is proposed to approximate composite regularisers.Aug-mented with the prevailing variance reducing(VR)stochastic methods(e.g.SVRG,SAGA),PA based algorithms would achieve a better performance.However,existing works require a fixed stepsize,which needs to be rather small to ensure that the PA ap-proximation is sufficiently accurate.In the meantime,the smaller stepsize would incur many more iterations for convergence.In this paper,we propose two fast PA based VR stochastic methods-APA-SVRG and APA-SAGA.By initializing the stepsize with a much larger value and adaptively decreasing it,both of the proposed methods are proved to enjoy the O(n log1/?+m01/?1)iteration complexity to achieve the ?-accurate solutions,where mo is the initial number of inner iterations and n is the number of samples.More-over,experimental results demonstrate the superiority of the proposed algorithms.

Keywords/Search Tags:

Variance reduction(VR), Acceleration, Douglas-Rachford splitting, Proximal operator, Proximal average, Adaptive stepsize

PDF Full Text Request

Related items

1	Research On CT Reconstruction Using Proximal Algorithms With Incomplete Projection Data
2	An Optimization Method For Solving Large-scale Machine Learning Problems
3	Sparse Representations Of Images And Its Application To Inverse Problems In Image Processing
4	Research On Models And Fast Reconstruction Methods For MR Image Reconstruction
5	Research Of Schedule In Wireless Peer-To-Peer Nework For Proximal Content Distribution
6	Research On Coded Diffraction Imaging Algorithm Based On Plug And Play Priors And Neural Network
7	The Application Of Alternating Direction Method Of Multipliers In Image Processing And Statistical Analysis
8	A Sparse Proximal Support Vector Machine In Classification Problems
9	Based On Accelerated Proximal Gradient Method And Semantic Spam Text Message Classification
10	Research On Robust Support Vector Machines Based On Zeroth-Order Optimization