Font Size: a A A

Study Of Self-Expressive Feature Selection

Posted on:2018-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2348330518956590Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-dimensional data usually contain noise and redundancy.In particular,the high dimensionality of the data not only increases storage space,but also decreases the performance of data mining while the number of dimensionality reaches to a certain number,i.e.,the issue of"curse of dimensionality".On the other hand,it is difficult to obtain labels because of limited resources.Therefore,unsupervised dimensionality reduction methods,which are designed to reduce the dimensionality of unlabeled data to solve above problems,have great significance in data mining.Previous dimensionality reduction methods can be divided into subspace learning and feature selection.Subspace learning is more effective than feature selection,but the results of feature selection are interpretable.This thesis proposes two unsupervised feature selection methods by combining subspace learning with feature selection in a unified framework to select useful features from high-dimensional data(i.e.removing redundancy and noise).As a result,the outputted low-dimensional data are available to enhance the performance of the learning models and be interpretable as well.The thesis lists the main contents and contributions as follows:(1)Based on the successful use of sample-level self-expressive characteristics,this thesis employs feature-level self-expressive characteristics to devise a simple and effective unsupervised feature selection framework--Unsupervised Feature Selection via Feature Self-Representation Property(SRFS for short).Specifically,SRFS uses the self-expressive loss function to represent each feature by other features to obtain the self-expressive coefficient matrix,which is also penalized a sparse l2,l-norm regularization.In our proposed optimization method to the resulting objective function,the sparse regularization induces the important features with high coefficients than the noisy/redundant features,so that outputting the importance of all features.In this way,the proposed SRFS considers the self-expressive characteristics of the features to interpret the importance of the features,so that the unimportant/noisy/redundant features are assigned small or even zero coefficients.In the experiments,this paper employs the proposed method and previous feature selection methods to select the features,and then the selected features are used to conduct support vector machine(SVM)classification,which is regarded as the evaluation method of feature selection.As a consequence,the proposed SRFS outperforms the comparison methods.(2)Traditional feature selection methods do not take into account the relationships among the features,such as their local structures or global structures.Based on the fact that outliers may increase the rank of the data matrix,this thesis combines the low rank constraint,manifold learning,hypergraph regularization,with the feature-level self-expressive characteristics in a framework,to design a new feature selection method named "Feature Self-representation Based Hypergraph Unsupervised Feature Selection via Low-rank Representation(SHLFS for short)".Specifically,the SHLFS,as an extension of our former SRFS,represents each feature by other features,embeds a low rank constraint to capture the global structures of the data,uses a hypergraph regularization to capture the complex relationships among the data,and an l2,1-norm regularization to achieve sparsity of coefficient matrix.Compared to our previous SRFS,the proposed SHLFS is more robust because it considers more relationships among the data than SRFS.In the experiments,this thesis uses two kinds of evaluation methods i.e.SVM classification and K-means clustering,on both multi-class and binary class datasets to show that the proposed SHLFS method is better than the existing dimensionality reduction methods in terms of several evaluation metrics.This thesis designs new feature selection methods to solve the drawbacks of previous methods.Specifically,this thesis creatively uses the feature-level self-expressive characteristics to conduct unsupervised feature selection,employs the hypergraph regularization and the low rank constraint to consider the complex relationships among the data,and uses sparse learning to assign the important features with large weights while the noisy/redundant features with small or even zero weights.To test the effectiveness of the proposed feature selection methods,this thesis tests the proposed methods on public datasets,in terms of evaluation metrics(such as ACC(accuracy)and NMI(Normalized Mutual information))on classification and clustering,compared to the state-of-the-art methods and classic methods.Experimental results show that the proposed methods achieve the best performance,than the comparison methods.In the future work,I plan to design new semi-supervised learning and deep learning methods to design new feature selection method to further solve the issue of high-dimensional data.
Keywords/Search Tags:Data mining, Graph learning, Low-rank constraint, Self-representation, Feature selection
PDF Full Text Request
Related items