The Study Of Large Scale Sparse Learning:Optimization And Its Applications

Posted on:2018-02-18

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W Z Zhang

Full Text:PDF

GTID:1318330512499428

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Large Scale Sparse Learning is one of the most important methods in machine learning,it has been widely used in many real-world applications,such as text mining,bio-informatics,image processing,news recommendation,etc.However,in large-scale applications involving a.huge number of samples and extremely high-dimensional features,large scale sparse learning model training remains a challenging problem.Hence large scale sparse learning has been a hotspot in both academia and industry for several years.Most of the existing training algorithms for Large Scale Sparse Learning are based on Stochastic Composite Optimization.However,due to the limitation in the online-to-batch conversion,they often failed to deliver sparse solutions.To address this problem,we propose a simple yet effective Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms.We further develop three concrete algorithms under our scheme to prove its effectiveness.Both the theoretical analysis and the experiment results show that our methods can outperform the existing methods at the ability of sparse learning and meantime we can decrease the high probability bound by one order of magnitude.Screening is an emerging technique,which has been shown to be promising in accel-erating Large Scale Sparse Learning model training.They can quickly identify and delete the irrelevant features or samples from the model and thus decrease the model size and im-prove the training efficiency.Unfortunately,existing screening methods study either feature screening or sample screening individually,they are helpless in scenarios with huge sample size and high dimensional features.We propose a novel screening approach for accelerating Sparse Support Vector Machine training�that is based on accurate estimations of the primal and dual optimums�to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs.Experimental results demonstrate that the speedup gained by our approach can be orders of magnitude.Moreover,we notice that Randomized Matrix algorithms are very efficient in Large Scale Data Analysis,they have not yet been used for Large Scale Sparse Learning model training.By introduce Randomized Matrix into Large Scale Sparse Learning,we develop a novel Accelerated Sparse Linear Regression based on Random Projection.Our method achieves a global geometric convergence rate,meanwhile,the computational complexity per iteration is greatly reduced.In addition,the sparsity of our intermediate solutions are well-bounded over the iterations.Finally,in real applications,due to the extremely high dimension of the bio-data(such as human gene data)and the highly complex sparse model used in Bioinformatics,model training is always the bottleneck issue when we apply Large Scale Sparse model to Bioin-formatics.For this problem,taking Alzheimer's Disease as an example,we propose a new approach for identifying its Genetic Risk Factors via Shared Tree-Guided Feature Learning across Multiple Tasks.This is a unique technique that can unify the strength of both the a prior structure information in the feature space and Multi Task Feature Learning model to boost,the prediction performance.We further develop a novel screening rule for accelerating the training process of our model.Our model significantly outperforms the state-of-the-art in detecting the genetic risk factors.What's more,our screening rule can speedup the model training by several orders of magnitude without scarifying any accuracy.

Keywords/Search Tags:

Large Scale Sparse learning, Stochastic Optimization, Screening, Randomized Matrix, Multi Task Learning, Generic Risk Factor

PDF Full Text Request

Related items

1	Sparse Low-rank Multi-task Learning And Its Application In The Prediction Of Bioactivity Of Ligand Molecules
2	Group Sparse Multi-task Learning Method And Its Application
3	Study On Hierarchical Multi-task Learning Algorithms For Large-scale Image Classification
4	Research On Large-scale Regularized Machine Learning Algorithms
5	Research On Sparse Learning Theory Based Automatic Annotation For Large-scale Social Images
6	Research On Learning Models Based On Sparse Optimization And Their Applications
7	Research On Direct Optimization Of AUC Algorithm For Large-scale Data
8	An Optimization Method For Solving Large-scale Machine Learning Problems
9	Parallel Stochastic Gradient Descent Algorithm On Large-scale High-dimensional And Sparse Data
10	Evolutionary Multi-objective Sparse Reconstruction And Ensemble Learning