Font Size: a A A

The Study Of Large Scale Sparse Learning:Optimization And Its Applications

Posted on:2018-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Z ZhangFull Text:PDF
GTID:1318330512499428Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Large Scale Sparse Learning is one of the most important methods in machine learning,it has been widely used in many real-world applications,such as text mining,bio-informatics,image processing,news recommendation,etc.However,in large-scale applications involving a.huge number of samples and extremely high-dimensional features,large scale sparse learning model training remains a challenging problem.Hence large scale sparse learning has been a hotspot in both academia and industry for several years.Most of the existing training algorithms for Large Scale Sparse Learning are based on Stochastic Composite Optimization.However,due to the limitation in the online-to-batch conversion,they often failed to deliver sparse solutions.To address this problem,we propose a simple yet effective Stochastic Composite Optimization scheme by adding a novel powerful sparse online-to-batch conversion to the general Stochastic Optimization algorithms.We further develop three concrete algorithms under our scheme to prove its effectiveness.Both the theoretical analysis and the experiment results show that our methods can outperform the existing methods at the ability of sparse learning and meantime we can decrease the high probability bound by one order of magnitude.Screening is an emerging technique,which has been shown to be promising in accel-erating Large Scale Sparse Learning model training.They can quickly identify and delete the irrelevant features or samples from the model and thus decrease the model size and im-prove the training efficiency.Unfortunately,existing screening methods study either feature screening or sample screening individually,they are helpless in scenarios with huge sample size and high dimensional features.We propose a novel screening approach for accelerating Sparse Support Vector Machine training—that is based on accurate estimations of the primal and dual optimums—to simultaneously identify the features and samples that are guaranteed to be irrelevant to the outputs.Experimental results demonstrate that the speedup gained by our approach can be orders of magnitude.Moreover,we notice that Randomized Matrix algorithms are very efficient in Large Scale Data Analysis,they have not yet been used for Large Scale Sparse Learning model training.By introduce Randomized Matrix into Large Scale Sparse Learning,we develop a novel Accelerated Sparse Linear Regression based on Random Projection.Our method achieves a global geometric convergence rate,meanwhile,the computational complexity per iteration is greatly reduced.In addition,the sparsity of our intermediate solutions are well-bounded over the iterations.Finally,in real applications,due to the extremely high dimension of the bio-data(such as human gene data)and the highly complex sparse model used in Bioinformatics,model training is always the bottleneck issue when we apply Large Scale Sparse model to Bioin-formatics.For this problem,taking Alzheimer's Disease as an example,we propose a new approach for identifying its Genetic Risk Factors via Shared Tree-Guided Feature Learning across Multiple Tasks.This is a unique technique that can unify the strength of both the a prior structure information in the feature space and Multi Task Feature Learning model to boost,the prediction performance.We further develop a novel screening rule for accelerating the training process of our model.Our model significantly outperforms the state-of-the-art in detecting the genetic risk factors.What's more,our screening rule can speedup the model training by several orders of magnitude without scarifying any accuracy.
Keywords/Search Tags:Large Scale Sparse learning, Stochastic Optimization, Screening, Randomized Matrix, Multi Task Learning, Generic Risk Factor
PDF Full Text Request
Related items