Font Size: a A A

Research On Semi-supervised Feature Selection Algorithm Based On Graph Learning

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y L TaoFull Text:PDF
GTID:2518306485485994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The continuous development of the Internet has brought massive high-dimensional data,which is characterized by many dimensions,many of which have relatively low value density.How to find a dimension with high value among many dimensions is a problem studied by many researchers and due to the rapid growth of data and the high cost of assigning data category labels,more and more data have no labels.Therefore,feature selection and semi-supervised learning have become research hotspots.In addition,because the graph model can simulate the manifold structure of data and has good data expression capabilities,semi-supervised feature selection based on graphs enters everyone field of vision.It combines feature selection,semi-supervised and graph learning theories as the feature selection One of the key research directions.The semi-supervised feature selection algorithm based on graph learning uses graphs to mine the structural relationship between labeled data and unlabeled data to obtain more structural information.At the same time,it uses labeled data to provide category information,thereby improving the effect of feature selection.Nowadays,the research of semi-supervised feature selection algorithms based on graph learning usually only focuses on the shallow structure relationship between data and the classification accuracy after feature selection.It ignores the influence of the structural information of the internal feature dimension on the importance of feature selection and the misclassification of data in reality.In addition,in the previous semi-supervised feature selection algorithms based on graph learning,the similarity of data is calculated based on Euclidean distance,and this calculation method will seriously affect the quality of the established graph matrix in the case of data noise and high dimension.Therefore,this article takes the misclassification of data,the internal structure in the feature dimension and the quality of the graph matrix as the starting point,and makes the following work.(1)Semi-supervised feature selection of static graph based on cost-sensitivity and feature graph.In view of the problem of the past semi-supervised feature selection algorithms based on graph learning excessively pursuing high accuracy and ignoring misclassification and only starting from the shallow relationship between data without considering the structural relationship of the data on the feature dimension,this section combining cost-sensitive,semi-supervised and feature graphs,a novel semi-supervised feature selection algorithm is proposed.Firstly,graph learning is introduced into semi-supervised learning to mine and express the local structure information and global information between data.First,cost-sensitive learning is introduced into the semi-supervised feature selection model based on graph learning to consider the problem of misclassification.By defining different costs for different types of misclassifications,the cost of misclassification of data is calculated.At the same time,this section transforms feature vectors into feature graphs to store the structural information between pairs of data in each feature dimension,provides more internal structure information and constructs feature information interaction matrix by using feature graph.The purpose is to maximize the correlation between candidate features and target features.At the same time,the redundancy of candidate features is minimized,so that the selected feature has a high correlation with the target feature and the redundancy between the selected features is low.Finally,the1-norm is used for sparse regularization to reduce the amount of calculation.(2)Semi-supervised feature selection of dynamic graphs based on cost-sensitive and self-expressing graphs.The previous semi-supervised feature selection algorithm based on graph learning only relies on simple Euclidean distance to measure the similarity relationship of the data in the space and this measurement method is due to the existence of noisy data and the continuous increase of the data space dimension.The effect of the relationship between the data will become worse and worse,or even invalid.So that the quality of the constructed graph matrix is not good,which will affect the effect of the machine learning model and the subsequent feature selection results.Thus,the use of self-expression-based graph learning instead of traditional graph learning improves the quality of the constructed graph and the dynamic graph theory is used to adjust the relationship between the graph and the projection matrix to improve the accuracy of the projection matrix.In addition,the previous feature selection method based on least square regression is to learn a projection matrix and use its results to measure the importance of features,which lacks theoretical explanation.This section uses a set of measurement factors to sort features and adjust the least squares regression coefficients,so as to learn the global solution and sparse solution of the projection matrix.On the other hand,misclassification has always existed in real life,and the costs of different types of misclassification are not the same.Therefore,combined with dynamic graph learning ideas,self-expression,cost-sensitive and the square of 2,1-norm,the Semi-supervised feature selection of dynamic graphs based on cost-sensitive and self-expressing graphs is proposed.Aiming at the shortcomings of the existing semi-supervised feature selection based on graph learning,this paper proposes a new semi-supervised feature selection algorithm.Research is carried out with cost-sensitive learning,feature maps,self-expression and dynamic graph learning as the core and use multiple evaluation indicators to verify and analyze the experimental results of the algorithm under uniform conditions.The experimental results show the effectiveness of the algorithm in this paper.
Keywords/Search Tags:Feature selection, Semi-supervised, Cost-sensitive, Feature graph, Self-expression
PDF Full Text Request
Related items