Font Size: a A A

Self-Representation Based Feature Selection Models And Algorithms

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:W C ZhuFull Text:PDF
GTID:2348330542481357Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of big data,we deeply feel that everything around us produces large amounts of data all the time.Not only the capacity of the data is large,but also the dimensions of the data are high and the categories of data are complex.Thus,we need handle the super high dimensional and complex data.However,as the dimensions of feature space increase,the parameters of the model exponentially increase and the model becomes more and more complex.It's easy to cause the model over-fitting and thus reduce the generalization ability of the model.In addition,data samples cannot be visualized in the case of super high dimensions and so on.Thus,How to solve the'curse of dimension'problem which is caused by high-dimensional data is the most important problem in pattern recognition and machine learning.Feature selection is a data preprocessing method which has been proved efficient in high-dimensional data mining and machine learning.However,it costs much to obtain the labels of data in most practical applications.Thus,unsupervised feature selection?UF-S?becomes more meaningful.Most existing UFS methods generate the pseudo labels by spectral clustering,matrix factorization or dictionary learning,and convert UFS to a super-vised problem.Many UFS algorithms don't consider the complexity of data,for example,multi-view features.We utilize the correlation under different views in this paper.This paper proposes three kinds of self-representation based unsupervised feature se-lection models.Since self-representation can mine the similarity between the data samples or data features,We can use self-representation to guide unsupervised feature selection tasks.The self-representation models contain sample-level self-representation models and feature-level self-representation models.The three unsupervised feature selection models proposed in this paper utilize the properties of sample-level self-representation and feature-level self-representation respectively.Our experimental results show that this three models are both efficient and achieve best results on test data.The research results and innovation points are summarized in the following several aspects:1.We proposed a novel subspace clustering guided unsupervised feature selection?S-CUFS?method.The clustering labels of the training samples are learned by repre-sentation based subspace clustering,and features that can well preserve the clusterlabels are selected.2.SCUFS can well learn the data distribution in that it uncovers the underlying multi-subspace structure of the data and iteratively learns the similarity matrix and clus-tering labels.3.we propose a non-convex regularized self-representation?RSR?model where fea-tures can be represented by a linear combination of other features,and propose toimpose L2,pnorm p=0 regularization on self-representation coefficients for unsu-pervised feature selection.4.we further propose an efficient iterative reweighted least squares?IRLS?algorithmwith guaranteed convergence to a fixed point.When p=0,we exploit AugmentedLagrangian Method?ALM?to solve the RSR model.5.we propose a multi-view dictionary learning based unsupervised feature selectionmodel.This model is also a feature level self-representation method.The modelassumes that the features are redundancy and the similarity matrix between featuresis low rank,we utilize this property to conduct the unsupervised feature selectiontask.6.we propose the solution of our optimization problem multi-view dictionary learningbased unsupervised feature selection and prove the effectiveness of our optimizationalgorithm.
Keywords/Search Tags:unsupervised feature selection, multi-view, self-representation, subspace clustering, optimization problem
PDF Full Text Request
Related items