Font Size: a A A

Research On Semi-supervised Feature Sparse Selection Method Algorithm Based On Least Squares Regression

Posted on:2022-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:X P WuFull Text:PDF
GTID:2518306740462534Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The development of information dissemination technology has caused exponential growth of high-dimensional digital information such as pictures,text,and video data in the Internet,which greatly restricts the development of data analysis and prediction.How to effectively solve the problem of high-dimensional data has become one of the research hotspots in the field of data mining and computer vision.Feature selection,as one of the important techniques for data dimensionality reduction,has been widely concerned by researchers.The rapid growth of diversified data makes the cost of obtaining labels more uncontrollable.Therefore,feature selection methods based on semi-supervised data have become a popular direction in the field of feature selection.In particular,the sparse feature selection method based on least squares regression learns and selects the most discriminative feature subset by designing an efficient learning model.However,on the one hand,the existing sparse feature selection methods based on least squares regression always fail to make good use of the information of semi-supervised data.On the other hand,it is often impossible to comprehensively consider the relationship between features and labels.Furthermore,most methods have not considered the redundancy of features.In response to the above d problem,this paper designed three algorithms to improve the original sparse feature selection model based on least squares regression from different perspectives and got good results.The three tasks of this paper are summarized as follows:1.A novel semi-supervised feature selection(SFS-LARLRM)algorithm based on local adaptation and redundancy minimization is proposed.In order to reduce the influence of outliers and noise,Local regression learning and adaptive learning are introduced into the semi-supervised sparse feature selection framework.Then,to remove redundant features,a redundant regularization method that penalizes high-correlation features is adopted.In addition,mutual information and Pearson correlation coefficient are used to calculate the feature similarity mapping matrix.Next,we designed an iterative method and proved the convergence of the method theoretically and experimentally.2.A novel low-dimensional Hessian semi-supervised sparse feature selection(HSLF)algorithm considering feature manifolds is proposed.This method designs a new low-dimensional Hessian regularization and introduces it into the semi-supervised sparse feature selection framework,which better preserves the local manifold structure of low-dimensional space.Then the similarity graph from the feature angle is constructed,and the feature manifold regularization method is adopted to make the feature selection matrix smoother than the feature manifold structure.In addition,a L2,1/2-norm sparse model is introduced to make the feature selection matrix more sparse.Next,an effective iterative method is proposed to solve the proposed objective function.3.A novel semi-supervised Hessian multi-label feature selection(S2MFSHMRMR)method based on maximum correlation and minimum redundancy is proposed.In this method,Hessian regularization and HSIC(Hilbert-Schmidt independence criterion,Hilbert-Schmidt independence index)are respectively used to reflect the inherent local geometric characteristics of the data and to solve the problem of the correlation between features and multiple labels.Then,a new custom redundancy regularization method is proposed,in which the redundancy between features is measured by GMM-based Bhattacharyya distance.In addition,the soft label adopted by the Bhattacharyya distance is obtained by a multi-label label propagation method.Finally,a closed-form method based on the matrix Lagrange multiplier method is proposed to minimize the objective function.In the above algorithms,the public data sets are used to evaluate their performance.Parameter sensitivity analysis,convergence and other experiments are also practiced.The experimental results show that all the algorithms proposed in this paper can obtain effective feature subsets to improve the performance of the classification algorithm under the condition of limited labels.
Keywords/Search Tags:Feature selection, Semi-supervised, Least squares regression, Sparse model
PDF Full Text Request
Related items