Font Size: a A A

Embedded Unsupervised Feature Selection Based On Sparse Learning

Posted on:2020-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y FanFull Text:PDF
GTID:2518306518463104Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of artificial intelligence and the era of big data,the number of fea-tures used to describe data has exploded in many domains.High-dimensional features may result in adverse effects in the performance of learning algorithms and demand more computational and storage requirements,which easily lead to overfitting.Feature selection aims to select the most representative features and eliminate redundant fea-tures from the original data,which has been proven to be an effective way to reduce dimensionality.In the real world,unlabeled data is becoming more and more popular due to the laborious manual marking and lack of prior knowledge.Without the avail-ability of labels,unsupervised feature selection is vital for comprehensive analysis of unlabeled high-dimensional data.At present,sparse learning is widely used in unsuper-vised feature selection scenarios.This method combines the feature selection process with the learning model and applies sparse regularization term on the feature selection matrix.When the model is trained,a sparse solution that conforms to the feature selec-tion semantics can be obtained.However,the existing methods are not perfect and have certain limitations.For example,the distribution information in the feature space and the redundancy relationship between the features are not considered.From the perspec-tive of sparse learning,two new unsupervised feature selection algorithms are proposed in this paper.The first work is based on latent space embedding for sparse unsupervised fea-ture selection.Inspired by multimodal learning,we consider the feature space and the pseudo label space as a modality of the data,respectively.Under the joint dictionary learning,a potential latent space shared by the feature space and the pseudo label space is obtained,which reflects the consistent distribution information of the two modalities.We use spectral clustering to learn more ideal pseudo label information,ensuring the completeness of latent space.In order to select a subset of features that can simulta-neously maintain the distribution information of the feature space and the pseudo label space,we introduce a linear regression model to minimize the fitting error of the feature space to the latent space.By applying the 2,1-norm regularization term on the feature selection matrix,features that are weakly related to the latent space are discarded.The clustering results on datasets in different fields verify the superiority of the algorithm.The second work is based on redundancy minimization for sparse unsupervised feature selection.We have extended the traditional matrix factorization based method,which mainly solves the problem that the traditional method ignores the redundancy among features.Highly correlated features typically have similar weights or rankings,and if the correlation between features is not considered when selecting top ranked fea-tures,there will be redundant information for the end result.Toward this end,we define a regularization to penalize the high-correlated features and embed it and the matrix factorization into a coherent model.We further investigate the effect of the 2,p-norm regularization on the feature selection framework.For the proposed optimization prob-lem,we designed an efficient iterative algorithm.Finally,experimental results on image datasets validate the effectiveness of the proposed approach.
Keywords/Search Tags:Feature selection, Unsupervised feature selection, Sparse learning, Latent space, Redundancy minimization
PDF Full Text Request
Related items