Font Size: a A A

Sparse And Low-rank Theory Based Feature Selection Algorithms

Posted on:2021-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhengFull Text:PDF
GTID:1488306512982369Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Feature selection,as one of the crucial methods of dimension reduction,has been an active topic in machine learning and pattern recognition.Because of its interpretability and protection of original features,feature selection can help us understand the causal association between features and labels,and thus is widely used in bioinformatics,text classification,image processing,and social media networks.With the advent of the era of big data,feature selection is also facing many challenges,such as data noise,vague data structure under unsupervised situations,potential over-fitting risks in selected subsets of traditional methods,etc.Meanwhile,the form of data handled by feature selection has gradually evolved from static data to dynamic data,which is more common in the real world.In this paper,some efforts have been made to handle existing problems and new challenges in the traditional feature selection methods.The main research results are summarized as follows:Firstly,a new collaborative learning framework is proposed for feature selection.To further improve the generalization performance of the selected subset,this method introduces an extra classifier for unselected features into the traditional embedded model,and jointly learn the feature weights to maximize the classification loss of unselected features.As a result,the extra classifier forces the unselected strongly relevant features to replace the weakly relevant features in the selected feature subset.Our final objective can be formulated as a min-max optimization problem.At the same time,feature selection is implemented by sparse constraints.Finally,a simple and effective gradient algorithm is used to optimize.Furthermore,we theoretically prove that the proposed method is able to improve the generalization ability of traditional embedded feature selection methods.Extensive experiments on synthetic and real-world datasets exhibit the interpretability and superior performance of the proposed method.Secondly,an unsupervised feature selection method based on low-rank structure pre-serving is presented.In order to weaken the "similarity" between the selected features and enhance the "similarity" between highly correlated samples,the proposed method inte-grates sparse regularization for data reconstruction and low-rank constraint for structure preserving in a unified framework.In this model,the data matrix consisting of selected features is assumed as a dictionary,which is learned by a low-rank constraint to preserve the subspace structure.Meanwhile,the sparse penalty removes the redundancy features helping to learn the intrinsic structure.In this way,the sample distribution can be pre-served by low-rank constraint more precisely via using discriminative features.In turn,the refined sample structure boosts the selection of more representative features.Both theoretical and experimental results support the effectiveness of our method.Thirdly,a robust unsupervised feature selection based on nonnegative sparse sub-space learning is introduced.This method casts the unsupervised feature selection sce-nario as a matrix factorization problem from the viewpoint of sparse subspace learning.By minimizing the reconstruction residual,the learned feature weight matrix with the group sparse and the non-negative constraints not only removes the irrelevant features,but also captures the underlying low dimensional structure of the data points.By adding the non-negative constraint,the proposed model is superior in terms of better inter-pretability.Furthermore,in order to enhance the model robustness,a sparse regularizer is used to reduce the impacts from outliers and sparse noise.An efficient iterative al-gorithm is designed to optimize this non-convex and non-smooth objective function and algorithm convergence is proved.It is worth mentioning that the non-negative iterative multipliers in this model are different from the classical modes using pure additive prin-ciple,but the non-negativity of these multipliers is proved.Comparative experiments demonstrate the superiority of our model on various benchmark datasets with and with-out malicious pollution.Finally,in the streaming feature scenario,data includes abundant redundancy,and the intrinsic structure of the sample space is hard to describe only by arrived features under the unsupervised situation.Existing methods usually employ the linear regression model to perform the online test without estimating the structure of the sample space dynamically,which further affects the evaluations of features.The existing unsupervised algorithms usually employ the simplified approximation form of the specific regression model for online testing,and can not dynamically estimate and maintain the structure of the sample space.Therefore,it is difficult to maintain the intrinsic characteristics of the data.To address this issue,we introduce the metric fusion to keep a similarity matrix which presents the latest relationship between samples.Such a similarity matrix is updated by each arrived batch features,and also employed to guide the selection in the next batch features.We formulate an objective by matrix alignment to remove the redundant features and preserve the useful ones.Moreover,this paper utilizes sparse constraints to characterize feature selection vectors,and designs an iterative shrinkage thresholding method to solve the proposed optimization model derives the compound threshold operator,and analyses the convergence of the algorithm.The validity of our method is verified on the mainstream feature selection datasets.In general,applications of sparse and low-rank theoretical research on feature selec-tion models bring physical interpretability and theoretical convergence guarantee.There-fore,these algorithms obtain excellent results and receive increasing attention in recent years.This paper focuses on the task of feature selection which combines the latest re-sults of the sparse and low-rank theory.We introduce a feature selection framework for dual classifier cooperation under supervision setting;design unsupervised algorithms for noisy data and low-rank structured data respectively;present the online selection scheme of stream feature under the unsupervised setting.The full text covers two application scenarios of static and dynamic data.It provides some new ideas for feature selection tasks under different application backgrounds.
Keywords/Search Tags:Feature selection, Sparse subspace learning, Low-rank constraint, Bi-level optimization model, Co-training, Unsupervised learning, Streaming feature, Non-negative matrix factorization, Generalization performance
PDF Full Text Request
Related items