Font Size: a A A

Unsupervised Feature Selection Based On Sparse Regression

Posted on:2019-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:J W ChangFull Text:PDF
GTID:2428330572458936Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and Internet of Things,the data nowadays produced in scientific research and daily life are high-dimensional data of tens of thousands or even thousands of dimensions.There are important features,redundant features,unimportant features and even noise features in the data with high-dimensional features.The purpose of data analysis is to obtain meaningful data from big data to complete the corresponding tasks such as clustering or classification.However,the explosive growth of data brings more time and space requirements to traditional data analysis methods.Therefore,we need to use the dimension reduction method to pre-process the high-dimensional data,and get the pure data in the low-dimensional space.In recent years,many sparse regression based feature selection algorithms have been proposed.However,these feature selection algorithms still have some limitations,such as sensitive to embedding dimension,insufficient sparsity and insufficient utilization of data information.Therefore,this paper made some improvements in these aspects to overcome the deficiencies of the existing models.The contributions of this paper mainly include the following points:?1?A new algorithm called unsupervised feature selection algorithm based on nonnegative matrix factorization and regularized sparse regression?JMFSR?is proposed.The JMFSR algorithm aims to find a more suitable pseudo-index index matrix.First,the nonnegative matrix factorization with orthogonal constraint is adopted to learn part-based data representation.Then,a feature weight matrix is learned through regularized sparse regression model.In addition,the l2,1-norm constraint imposed on the term of sparse regression and the feature weight matrix,which can effectively select a representative feature subset.?2?A new algorithm called unsupervised feature selection algorithm based on self-representation sparse regression and local similarity preserving?UFSRL?is proposed.The UFSRL algorithm aims to overcome the problem that the general feature selection algorithms are sensitive to the embedding dimension and insufficient in sparseness.First,the original data is sparsely reconstructed in UFSRL algorithm rather than fitting low-dimensional embedding.Second,the algorithm uses the manifold learning to preserve the local similarity of the data.In addition,the algorithm uses l2,1/2-matrix norm to constrain the coefficient matrix to ensure the row sparsity of the coefficient matrix,which makes the UFSRL model sparse and robust to noise.?3?A new algorithm called self-representation and nonnegative matrix factorization-based mixed-graph regularized feature selection?SRMFMR?is proposed.The SRMFMR algorithm aims to solve the problem of insufficient utilization of data information by general feature selection algorithms.First,the self-representation matrix is non-negatively decomposed to obtain a new feature selection matrix and a coefficient matrix.Then,a mixed graph model is adopted in SRMFMR,namely constructing global graph in the sample space and local graph in the feature space,respectively.Therefore,SRMFMR can effectively protect the global information of the data space and the local similarity information of the feature space.
Keywords/Search Tags:Feature selection, sparse regression, regularization, nonnegative matrix factorization, similarity preserving, l2,1/2-matrix norm, mixed graph, self-representation
PDF Full Text Request
Related items