Embedded Unsupervised Feature Selection Based On Sparse Learning

Posted on:2020-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y Fan

Full Text:PDF

GTID:2518306518463104

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of artificial intelligence and the era of big data,the number of fea-tures used to describe data has exploded in many domains.High-dimensional features may result in adverse effects in the performance of learning algorithms and demand more computational and storage requirements,which easily lead to overfitting.Feature selection aims to select the most representative features and eliminate redundant fea-tures from the original data,which has been proven to be an effective way to reduce dimensionality.In the real world,unlabeled data is becoming more and more popular due to the laborious manual marking and lack of prior knowledge.Without the avail-ability of labels,unsupervised feature selection is vital for comprehensive analysis of unlabeled high-dimensional data.At present,sparse learning is widely used in unsuper-vised feature selection scenarios.This method combines the feature selection process with the learning model and applies sparse regularization term on the feature selection matrix.When the model is trained,a sparse solution that conforms to the feature selec-tion semantics can be obtained.However,the existing methods are not perfect and have certain limitations.For example,the distribution information in the feature space and the redundancy relationship between the features are not considered.From the perspec-tive of sparse learning,two new unsupervised feature selection algorithms are proposed in this paper.The first work is based on latent space embedding for sparse unsupervised fea-ture selection.Inspired by multimodal learning,we consider the feature space and the pseudo label space as a modality of the data,respectively.Under the joint dictionary learning,a potential latent space shared by the feature space and the pseudo label space is obtained,which reflects the consistent distribution information of the two modalities.We use spectral clustering to learn more ideal pseudo label information,ensuring the completeness of latent space.In order to select a subset of features that can simulta-neously maintain the distribution information of the feature space and the pseudo label space,we introduce a linear regression model to minimize the fitting error of the feature space to the latent space.By applying the _2,1-norm regularization term on the feature selection matrix,features that are weakly related to the latent space are discarded.The clustering results on datasets in different fields verify the superiority of the algorithm.The second work is based on redundancy minimization for sparse unsupervised feature selection.We have extended the traditional matrix factorization based method,which mainly solves the problem that the traditional method ignores the redundancy among features.Highly correlated features typically have similar weights or rankings,and if the correlation between features is not considered when selecting top ranked fea-tures,there will be redundant information for the end result.Toward this end,we define a regularization to penalize the high-correlated features and embed it and the matrix factorization into a coherent model.We further investigate the effect of the _2,p-norm regularization on the feature selection framework.For the proposed optimization prob-lem,we designed an efficient iterative algorithm.Finally,experimental results on image datasets validate the effectiveness of the proposed approach.

Keywords/Search Tags:

Feature selection, Unsupervised feature selection, Sparse learning, Latent space, Redundancy minimization

PDF Full Text Request

Related items

1	Research And Improvement Of Feature Selection Algorithms Based On Sparse Learning
2	Sparse And Low-rank Theory Based Feature Selection Algorithms
3	Research And Application Of Max-Correlation And Mix-Redundancy Unsupervised Feature Selection
4	Research On Unsupervised Feature Selection Method Based On Regularized Regression Model
5	Research On Feature Selection Algorithm Based On Similarity
6	Feature Selection Based On Latent Representation And Spectral Graph Analysis
7	Relief-based Feature Selection Algorithms
8	Research On Graph Embedding Feature Selection Model Based On Sparse Constraints
9	Research On Sparse Regression For Feature Selection
10	Research On Unsupervised Feature Learning Algorithms Based On Sparse Modeling And Information Theory Learning