Font Size: a A A

Feature Selection Algorithm And Its Application In Image Clustering And Recognition

Posted on:2024-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:L L GuoFull Text:PDF
GTID:2568307127953349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature selection,also known as attribute selection,belongs to feature engineering and is a basic component of modern machine learning.As a data preprocessing strategy,feature selection is important in preparing high-dimensional data for machine learning and pattern recognition.It can overcome the curse of dimensionality,simplify algorithm model,reduce training time,and help to prevent over-fitting and enhance generalization ability.Due to the scarcity of available labels,high-dimensional data analysis has focused on the research of unsupervised feature selection methods.However,most existing unsupervised feature selection methods always assume that the data are independent and identically distributed,and the feature selection is poor when dealing with data with missing,noisy,etc.In this paper,we do not make any assumptions,focus on data loss,noise,etc.,and study sparse regression models based on latent representation and the corresponding unsupervised feature selection methods,as follows for the three main works of this paper:1)The first work gives a robust unsupervised feature selection model based on multi-step Markov transfer probabilities,which characterizes the stream structure of data by maximum multi-step Markov transition probability and latent representation(MMLRL).This model characterizes the stream structure of the data by the maximum multi-step Markov transfer probability,then learns the latent representation of data by symmetric non-negative matrix decomposition model,and finally embeds the latent representation into a sparse regression model to perform feature selection in the latent representation space of data.2)The second work is to develop a latent and symmetric low-rank representation sparse regression model to achieve unsupervised feature selection of the data matrix(LLRRSC).First,LLRRSC uses the coefficient matrix of non-negative symmetric low-rank representation to construct the affinity graph matrix characterizing the correlation between data points,thus revealing the information of intrinsic geometric relationship,wholeness and discriminative of data points.Then latent representation learning is performed from the affinity graph matrix to obtain the latent representations of all data points,to mine the interconnection information between data points.The learned latent representation matrix is regarded as a pseudo-label matrix,which provides identification information for feature selection.Finally,unlike previous algorithms that select features in the original data space,the algorithm proposed in this work performs feature selection in the learned latent space through sparse linear regression of latent representations of data points.3)The third work proposes a sparse regression model for dual latent and low-rank representations with symmetric constraints(DLLRRSC).Under the symmetric constraint,the DLLRRSC model jointly learns the affinity graph matrices of data and features,which can characterize the correlation between data points and between features from the low-rank representation of data and the low-rank expression of features,respectively;then the latent representation of data and the latent representation of features are obtained separately by latent representation learning;finally,the latent representation matrix of data space is used as the pseudo-label matrix,and the latent representation of features is used as the regression coefficient matrix to build a sparse regression model to select features in the acquired dual latent space.Experiments on a variety of public datasets show that the proposed algorithm in the paper has some improvement in both theory and performance performance,and has some application value in recognition and clustering of images.
Keywords/Search Tags:latent representation, low rank representation, sparse regression, feature selection, unsupervised
PDF Full Text Request
Related items