Font Size: a A A

A Study On Unsupervised Feature Selection Algorithms For High Dimensional Data

Posted on:2018-07-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B LiuFull Text:PDF
GTID:1318330542481197Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of science and technology,machine learning has been applied in more and more social life.In the field of machine learning,a large number of problems,such as the recognition of biological information data,face recognition,data analysis on electronic commerce,can be summarized to the processing of high dimensional data.The occurrence of high dimensional data,however,presents new challenges to data processing problem.It is not only difficult to be intuitive understanding,but also brings the“curse of dimensionality”problem in machine learning.Moreover,unlabeled data are often rel-atively plentiful compared to labeled data.Unsupervised learning is more challenging due to lack of guiding information with labels.Therefore,how to select effective features for high-dimensional data processing is one of the current research focuses and challenges.In this paper,we focus on unsupervised feature selection techniques.Specifically,there are three methods as follows:1)Diversity-Induced Unsupervised Feature SelectionThe main limitation of existing unsupervised feature selection methods is that they ignore the diversity of features,and thus cannot capture the comprehensive information from features.To this end,we propose a novel feature selection method,called Diversity-Induced Self-Representation?DISR?for unsupervised feature selection.This method considers both representativeness and diversity properties of features.Based on self-representation property,the most representative features can be selected.Meanwhile,by incorporating the similarities between features to adjust the weights of being selected,we introduce a diversity term to guide the process of feature selection.This implies that the most representative features containing information as much as possible can be selected,so that the redundancy can be greatly reduced.An efficient optimization algorithm is provided by using an Augmented Lagrangian Alternating Direction Minimization?AL-ADM?method.Experimental results on both clustering and classification tasks demon-strate the effectiveness of the proposed method.2)Local Structure Preservation for Unsupervised Feature SelectionThe main limitation of self-representation based unsupervised feature selection meth-ods is that they ignore the local structure of features.To this end,we establish a novel method called unsupervised feature selection using Structured Self-Representation?SSR?,which selects the most representative features by exploring the local geometrical structure of feature space directly.In our model,both Graph regularization term and 2,1-norm regu-larization term are utilized to incorporate the locality information and enforce the sparsity of the coefficients for feature reconstruction,respectively.The minimization problem is convex and thus can be solved efficiently by an iteration variables method.Experiments on the synthetic and real-world datasets demonstrate the encouraging performance of the proposed method over the state-of-the-arts.3)Structural Constraint for Unsupervised feature selectionTo incorporate the prior information in clustering,we propose an ideal Local Struc-ture Learning?LSL?for unsupervised feature selection.Real-world data always contains lots of noise samples and features that may make the similarity matrix obtained by orig-inal data unreliable.Yet there is a clear block diagonal struture for the similarity matrix of the clustering results.In this paper,different from the most existing technology of fixing the input data structure associated to the similarity matrix,we learn a more rea-sonable data similarity matrix with block-diagonal property.Meanwhile,the cluster label indicators are obtained by spectral analysis to guide the feature selection procedure,so the most discriminative features can be selected to improve the accuracy of clustering.Finally,Experiments on various real-world datasets validate the effectiveness of the pro-posed method.The new methods proposed in this thesis excavate the diversity of features,local structure of features and a priori structural information for unsupervised feature selection.Also,they are the effective exploration for related problems and enrich research topics in feature selection field.
Keywords/Search Tags:High dimensional data, unsupervised feature selection, diversity of features, local structure, a priori structural constraints
PDF Full Text Request
Related items