Font Size: a A A

Feature Selection Algorithms Research Based On Self-paced Learning And Robust Estimators

Posted on:2020-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Z GanFull Text:PDF
GTID:2428330596973757Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
High-dimensional big data in the information age usually shows high dimensionality and diverse sources.Since these feature data are not selected in the process of accumulation,there are a large number of irrelevant and redundant feature in high-dimensional data,and feature that can express data functions effectively are wrapped in them.This not only increases the space required to store data,but also consumes a lot of computing resources.Especially,when the data dimension increases to a certain value,the performance of data mining algorithm will decline.Therefore,dimensional reduction is very important for solving many problems brought by high-dimensional data.Feature selection is an effective method of dimension reduction,and has better performance than subspace learning in reliability and interpretation of results,but subspace learning can be used to explore the internal structure of data.Therefore,this paper combines the advantages of these two kinds of methods,and proposes two novel feature selection algorithms for considering the influence of noise and outliers.We also consider the manifold structure of the big data in the real world in order to improve the performance of data mining algorithms.Details are as follows:(1)In order to solve the problem that existing feature selection model does not consider the influence of outliers and thus leading to the poor generalization ability of the model,a novel feature selection method that combines self-paced learning with sparse feature selection is proposed.Specifically,the feature selection model is trained firstly using the most high-confidence samples base on the self-paced learning theory,and then adds the more high-confidence training samples in the remaining samples to increase the generalization ability of the initial feature selection model until the generalization ability of the model is not improved or all training samples are used up.As a result,the selected features may improve the efficiency and effectiveness of the multi-output regression.Experimental results on six datasets show that our proposed method is superior to the comparison algorithms.(2)Traditional feature selection models are susceptible to outliers and fail to take into account local manifold structures in the data.In this paper,we propose conducting robust graph dimensionality reduction by learning a transformation matrix to map original high dimensional data into their low-dimensional intrinsic space without the influence of outliers.To do this,we propose 1)adaptively learning three variables,i.e.,a reverse graph embedding of original data,a transformation matrix,and a graph matrix preserving the local similarity of original data in their low-dimensional intrinsic space;and 2)employing robust estimators to avoid outliers involving the processes of optimizing these three matrices,simultaneously.As a result,original data are cleaned by two strategies,i.e.,a prediction of original data based on three resulting variables and robust estimators,so that the transformation matrix can be learnt from accurately estimated intrinsic space with the helping of the reverse graph embedding and the graph matrix.Moreover,we propose a new optimization algorithm to the objective function,and also prove the convergence of our optimization algorithm theoretically.Experimental results indicated that our proposed method outperformed all the comparison methods in terms of different classification tasks.From the above,the paper innovatively embeds self-paced regularizer and robust factor into the feature selection model.Self-paced learning as a robust learning method tends to explore data smoothly,and robust estimation assigns lower weights to outliers to minimize the impact of outliers,explore the internal structure of data in conjunction with manifold learning.To test the performance of the proposed method,all experiments were performed on public datasets and compared with excellent algorithms in recent years,using classification and regression as evaluation methods.The experimental results show that the performance of the method we designed is better than the comparison algorithms,which proves the effectiveness of our proposed method.
Keywords/Search Tags:Self-paced Learning, Reverse Graph, Feature Selection, Robust Estimators, Subspace Learning
PDF Full Text Request
Related items