Font Size: a A A

Research On Optimization Theory And Algorithms For Online Learning

Posted on:2018-02-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W XueFull Text:PDF
GTID:1368330575979578Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The continuous development of modes of information dissemination and the rise of technologies such as cloud computing and internet of things make the human society enter the era of big data.In this era,due to the characteristics of the continuous and numerous sources as well as low density value of big data,the tools for data processing should be real-time,and they can realize not only the online processing but also the online analysis.Online learning technology uses stream computing model,leading to simple and fast operational procedure,low learning complexity and fast model update rate.All these advantages of online learning make it become a promising tool for big data especially for streaming data.On the premise that data can be processed in real time,the generalization ability of models(e.g.,learning error,complexity of the model space)is also the emphasis of current research.Regularization is a dominant theme in machine learning due to its remarkable ability to address problems like model space controlling and overfitting occurred in the process of modeling.As learning problems widely exist in large-scale data,developing parsimonious models and efficient algorithms are promising and necessary for these problems.For the purpose of handling large-scale learning problems,this dissertation has further studied and explored online learning technology within the framework of regularization.Specifically,we have put forward several online learning algorithms,which have been applied in many problems,such as data classification,prediction,conjoint analysis and image processing.The main results and the innovation of this dissertation are summarized as follows.(1)We propose a coupled linearized online alternating direction method of multipliers,which aims to solve nonsmooth empirical risk minimization problems.The proposed algorithm operates simply,and it is easy to implement.We first transform the primal problem into an equivalent constrained minimization problem with a sepa-rable structure.Then,we split the augmented Lagrangian function of the constrained optimization problem and minimize the resulting subproblems with one variable by fixing another one.This method is easy to execute without calculating matrix inversion by implementing three linearized operations per iteration,and at each iteration we can obtain a closed-form solution.Through detailed theoretical analysis,we obtain the regret bound of the proposed algorithm and its convergence rate.Under some mild conditions,the proposed algorithm can achieve O(1/(?))convergence rate for convex learning problems and O((log T)/(?))for strongly convex learning,where T is the num-ber of samples.The comparative experiments with several current related methods are reported,which demonstrate the feasibility and effectiveness of our approach.(2)We propose an online learning algorithm,named Stochastic Spectral Gradient Descent(S~2GD),for a type of learning problem whose objective function is formulat-ed as an average of a large number of smooth component functions.The difference between the S~2GD algorithm and the traditional stochastic gradient descent(SGD)algorithm is that S~2GD employs the Rayleigh quotient to collect second-order information to construct Hessian inverse approximations,thus designing a new learning stepsize.At each iteration,the generated search direction guarantees descent property.The existing conclusion indicates that the S~2GD method is of convergence.Actually,the S~2GD algorithm can be viewed as an approach that extends the spectral gradient method working in deterministic setting to stochastic setting.Experimental results on standard data sets are reported to demonstrate the feasibility and effectiveness of the proposed algorithm.(3)We propose two weighted multi-task feature selection models to enhance the sparsity of the learning variables and provide two online algorithms to solve these models,respectively.To the best of our knowledge,this work is the first research on the online weighted multi-task feature selection.The main advantages of our approach is that i)it can be applied to the situation that the training data appears sequentially;consequently,the training procedure can be conducted at any time;ii)it can process the data up to any size with any number of features;iii)training the learning model is very efficient because we can derive closed-form solutions to update the weights.The worst-case bounds of the time complexity and the memory cost of this algorithm at each iteration are both in O(N × Q),where N is the number of feature dimensions and Q is the number of tasks.Theoretical analysis for the regret bound of the proposed algorithms is presented,which also guarantees their convergence rates.Experiments indicate that the proposed algorithms can yield better performance,e.g.,in terms of convergence speed and sparsity.(4)By using the characteristic that noise can not be represented sparsely over any dictionary,we propose an online dictionary learning algorithm,named Mini-batch K-sparse Dictionary Learning(MKDL),for large-scale image denoising problems based on sparse representation.We consider the K-sparse optimization model with l0 norm constraint,and solve the proposed model by using alternative optimization.At each iteration of MKDL,only a small number of training samples are used to encode and update dictionary.More precisely,iterative hard thresholding and projected gradient descent schemes are employed to optimize the two above-mentioned stages,respective-ly.The concrete operation flow includes three aspects:i)adding noise to the clean image;ii)extracting a large number of sub-images(samples)from the noisy image to learn a redundant dictionary;iii)using the representation coefficients of the noisy image over the learned dictionary to achieve the noise reduction.Experimental results on image denoising have much better performance than some existing dictionary learning algorithms,which validates the effectiveness of the proposed approach in convergence speed and denoising quality.
Keywords/Search Tags:Online Learning, Alternating Direction Method of Multipliers, Stochastic Gradient Descent, Multi-task Learning, Dictionary Learning
PDF Full Text Request
Related items