Font Size: a A A

Research On Learning Models Based On Sparse Optimization And Their Applications

Posted on:2019-10-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:T J LuoFull Text:PDF
GTID:1368330623450369Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the development of data collection and network technology,it becomes easier and faster to obtain data in various fields than before.The era of big data is coming.In the dataset,there are various characteristics,such as high dimensional features,different forms,complex structures and lack of annotations etc,which are challenging for subse-quent data processing.However,the data is sparse in feature space and instance space.Therefore,the learning methods based on sparse optimization is one of natural ways to effectively mine valuable information.From the perspective of scientific research,re-search on learning models based on sparse optimization and their applications is one of hot topics in machine learning and also have great theoretical and practical significance.This dissertation studies thoroughly and systematically the network reconstruction,feature learning and multiple instance learning based on sparse optimization theory,by using methods of matrix theory,statistical theory,machine learning and statistics.The main results and contributions of this dissertation are as follows.1.A new reweighting optimization framework is proposed to solve different types of learning problems.Under the proposed framework,the fast optimization methods are designed for the problems with different convexity and smoothness con-ditions,which further deepen the understanding of sparse learning problems in dif-ferent applications.By comprehensively analyzing and summarizing the differences and relationships among various optimization problems in sparse learning models,they are di-vided into four groups and unified into a new reweighting optimization framework,which helps to comprehensively analyze the learning methods based on sparse optimization.After summarizing the optimization problems of this thesis,under this framework,we proposed continuous fast gradient descent method with sparse structural convex penalty,generalized sparse low-rank iteratively reweighted minimization method,nonconvex lo-cally linear approximation gradient descent method and accelerated block coordinate de-scent method.Meanwhile,the proposed optimization methods were successfully applied to solve the sparse learning problem proposed in this thesis.2.Motivated by the idea of sparse signal reconstruction,we propose a new ro-bust network structure reconstruction approach with sparse total variational regu-larizer,which can adaptively learn the neighborhood structure,for robust the noise of structured network data.By analysis,the network structure reconstruction problem can be reformulated by the idea of sparse signal recovery.Motivated by this,a novel robust network structure reconstruction model is proposed to select the neighborhoods adaptively.Our model automatically captures the network structure information and quickly identifies the neighbor structure of nodes by introducing sparse structure Elastic-net penalty and total variation regularity.The experimental results show that its robustness to the noise is significantly better than the traditional reconstruction method.3.Unlike traditional approach,a hierarchical tree structure is used to model the intrinsic structure of data.Meanwhile,we propose a new shared tree-guided sparse feature learning across multiple tasks?STM?for identifying genetic risk factors of Alzheimer's disease.Moreover,hierarchical feature screening?HFS?rules are de-signed to accelerate the training efficiency of STM and improved the ability to iden-tify causal variants of complex diseases.To better approximate the intrinsic structure of data,a hierarchical tree structure is used.To identify genetic risk factors for Alzhe-mer's Disease,we propose a novel method to improve the generalization performance by integrating the tree structures with multi-task feature learning to deeply explore the hi-erarchical structure of features and the common structural information of multiple related tasks.However,due to the highly complex regularizer that encodes the tree structure and the extremely high feature dimensionality,the learning process can be computationally prohibitive.To address this,we further develop an effective hierarchical feature screen-ing rule to to quickly identify and remove the irrelevant features before the training of our model.Experiment results and medical evaluation demonstrate that the proposed ap-proach significantly outperforms the state-of-the-art in detecting genetic risk factors and speedup the training process by the proposed screening method.4.To solve the problems of the noisy features,lack of annotations and outliers,new robust discriminative feature learning via sparse low rank model and semi-supervised feature selection method based on insensitive sparse regression model are proposed,which can improve the applicability to the open environments with noise and outliers.The traditional methods are sensitive to the noise and outliers of data,while there are many noise and outliers,lack of annotations in the open environments.To address this,we propose a robust discriminant feature learning?RDFL?method,in which the noise level of each class and each sample is adaptively calibrated to improve the noise robustness of the algorithm,by jointly minimizing?2,1-norm the reconstruction error and intraclass distance to preserve the discriminative features.In addition,to deal with the case of a large number of unlabeled and small false labeled instances,a semi-supervised feature selection method based on insensitive sparse regression model?ISR?is proposed by integrating?2,q-norm sparse penalty and capped?2-?p-norm loss with the enhanced soft label by label propagation.Extensive experimental results on multiple public datasets in-dicate our methods outperform the other methods in feature selection.5.Due to the lack of isoform-level ground-truth,many existing functional anno-tation approaches are developed at gene-level.We propose a novel approach to dif-ferentiate the functions of protein coding isoforms?PCIs?by integrating a nonconvex sparsity-inducing regularizer with the framework of multi-instance learning?MIL?.Meanwhile,under this framework,we propose weighted Logistic regression-based MIL and weighted hinge loss-based MIL to improve the performance of functional annotation.Experimental results indicate the efficiency of our methods.The optimal classification hyperplanes of traditional multi-instance learning methods are sensitive to the potential negative instances of positive bags.To address this,for each positive bag,a weight vector is introduced to measure the contribution of its instances,adaptively se-lect the potential positive instances,and enhance the discriminative performance of the model.Then,to differentiate the functions of human PCIs,we construct non-convex multi-instance learning framework by integrating sparse simplex projection—that is,a nonconvex sparsity-inducing regularizer—with MIL.Our framework is flexible to incor-porate various smooth and non-smooth loss functions such as logistic loss and hinge loss.Meanwhile,we propose weighted Logistic regression-based MIL and weighted hinge loss-based MIL to improve the performance of functional annotation based on this frame-work.Extensive experiments on human genome data show that our methods significantly outperform the state-of-the-art methods in terms of functional annotation accuracy of hu-man PCIs and efficiency.
Keywords/Search Tags:Iteratively Reweighted Optimization Framework, Network Reconstruction, Semi-supervised Learning, Robust Feature Selection, Multi-Task Feature Learning, Nonconvex Multi-Instance Learning, Sparse Optimization, Face Recognition, Video Semantic Recognition
PDF Full Text Request
Related items