Font Size: a A A

Research On Semi-Supervised Feature Selection Algorithm Based On Graph

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z C ZhouFull Text:PDF
GTID:2518306107498824Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,the generated data set is becoming more and more complex,resulting in the diversity of original sample data features.Among the numerous features,there are some irrelevant features,redundant features and the influence of noise,which not only increases the computational consumption of the model,but also easily leads to the overfitting of the model,and even affects the performance of the learning model.In order to eliminate redundant features,reduce the impact of data noise and reduce the problem of dimension disaster,feature selection technology has attracted the continuous attention and wide application of researchers in various fields of machine learning.In this paper,two feature selection algorithms are proposed to solve the feature selection problem of large sample data and high dimensional data respectively.The main work of this paper includes the following two parts:1.An adaptive loss-semi-supervised sparse feature selection framework(AHFS)based on Hessian regularization was proposed to solve the problem of genetic diagnosis task with few labeled and high dimensional features.Firstly,because of the null space characteristic generated by Hessian matrix,the local geometric characteristic inherent in the data manifold can be well preserved,which is conducive to learning the function whose value changes linearly with the measured distance.At the same time,most of the existing semi-supervised feature selection algorithms use l2norm as a loss function to measure the predicted label error,but the outliers with significant loss will lead to the sensitive performance and poor robustness of the model.However,using l1norm as a loss function can alleviate the sensitivity to outliers to a certain extent,but it will be sensitive to small losses.In order to overcome the shortcoming of the loss-function based on l1norm and l2norm,the adaptive loss is used to measure the error of the predicted label matrix,and the optimal Hessian matrix is obtained by the adaptive neighbor allocation strategy,so as to enhance the robustness of the feature selection model.In addition,using l2,1norm as implicit regularization constraint projection matrix W can obtain more sparse regression coefficients and improve the feature subset discrimination.2.A semi-supervised feature selection algorithm(SAGFS)based on self-adjusting graph is proposed for video semantic recognition task data with large sample size and high dimensional features.Unlike the traditional semi-supervised feature selection algorithms that relies directly on the initial Laplace data matrix,SAGFS learns a new sparse similarity diagram matrix to replace the original similarity matrix,making the proposed model insensitive to the initial data.On the other hand,when learning a new similar graph matrix,the new graph matrix can be self-adjusted according to the process of local geometric structure and feature selection of the input training data.Through the embedding of optimal sparse similarity graphs,SAGFS combines graph regularization so that geometric structures can be embedded into popular learning.Then,by measuring how the simple linear regression function matches the soft label matrix,the best projection matrix and the soft label matrix can be obtained.In addition,l2,pnorm and parameters?are used to obtain an efficient feature selection row sparse projection matrix.Finally,the superior performance of SAGFS algorithm is demonstrated based on video semantic recognition task data set and other real word data set.
Keywords/Search Tags:Semi-supervised, graph regularization, feature selection, high dimensional data, l2,p-norm
PDF Full Text Request
Related items