Font Size: a A A

Research On Unsupervised Feature Selection Algorithm Based On Graph

Posted on:2022-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:B XuFull Text:PDF
GTID:2480306545953819Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the development of network information technology and the common application of high-tech products,the information data acquired by many industries are expanding rapidly in terms of quantity and content,and the information data are characterized by high dimensionality and complexity.The high-dimensional data puts tremendous pressure on data storage and related applications in which these information data are rich in useful information features,but also contain a large number of irrelevant and redundant features and noise.Direct processing of such raw data will face the problem of"dimensional catastrophe",which will greatly increase the computational overhead of the model,and the model is prone to overfitting on high-dimensional data,resulting in unsatisfactory model performance in practice.In addition,the real information data are mostly unmarked data,and it is not practical to manually pay a lot of high-dimensional data tag tags in the later period.It has become a hot topic to screen useful feature information from high-dimensional data,and alleviate the problem of"dimension disaster".This paper addresses the problem of solving feature selection under unlabeled high-dimensional data,and proposes two effective unsupervised feature selection algorithms,in which further improvements are made for the practical effects of the algorithms to speed up the processing of large samples of high-dimensional data and improve the practical performance of the algorithms.The main work of this paper includes the following two parts:1.To address the problems that most unsupervised feature selection algorithms based on graph learning have insufficient ability to explore the local manifold structure inside the data,inefficient graph learning and complex model optimization parameters,a l2,0 norm constrained orthogonal locality preserving projection unsupervised feature selection(OLPPFS)method based on graph embedding learning is proposed.Firstly,the local preserving projection(LPP)method is used to explore the local geometric manifold structure within the data,while the orthogonality of projection direction is constrained to enhance the linear mapping ability and facilitate data reconstruction.Then,the sparse regularization method is used to construct the sparse projection matrix to select features.In the aspect of data similarity graph construction,the traditional KNN composition method using Gaussian kernel function to measure the similarity between data points is abandoned.Inspired by the Laplace rank constrained clustering algorithm(CLR),a sparse connection similarity graph of the original sample data is constructed,and the data matrix is learned to be a high sparse similarity matrix,and the calculation formula of the similarity matrix does not involve the parameters that need to be adjusted.Most unsupervised feature selection algorithms use l2,1 norm as the sparse regularization constraint norm.l2,1 norm is a convex and smooth function,which is easier to solve than the non convex and unsmooth l2,0 norm,but it will bring complex regularization parameter adjustment problems.In this paper,an iterative algorithm is designed to optimize the l2,0 norm NP hard problem directly to optimize the model.The whole algorithm can effectively select information features without complex parameter adjustment.2.Based on OLPPFS algorithm,a fast orthogonal locality preserving projection unsupervised feature selection algorithm(FOLPPFS)based on anchor strategy is proposed for task data with large sample size and high-dimensional features,which can enhance the processing ability of the algorithm for large sample high-dimensional data and further improve the performance of the algorithm.In the process of graph learning,the construction of similarity matrix is accelerated by embedding anchors.Compared with the almost full rank non random similarity matrix constructed by traditional k-nearest neighbor method,the similarity matrix constructed by anchor strategy has the properties of sparse low rank symmetry,PSD and random.The experimental results on selected open standard data sets show that the performance of the algorithm is better than other algorithms,especially in the speed of the algorithm.
Keywords/Search Tags:Unsupervised, graph learning, sparse regularization, feature selection, l2,0 norm, anchor strategy
PDF Full Text Request
Related items