Font Size: a A A

Robustness Research Based On Feature Selection

Posted on:2020-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhengFull Text:PDF
GTID:2428330596473762Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of data informatization,it has become very difficult to manually extract the information from the data.For this reason,machine learning methods based on statistical thought have been proposed,in which knowledge discovery methods such as classification,clustering and regression analysis are widely used.However,the redundancy and bias information that may exist in the original data may make the knowledge discovery algorithms difficult to obtain the expected results.Therefore,it is also important to process the data in advance in the field of machine learning.Data preprocessing concepts have been proposed to aim at extracting more important and"pure"information from raw data,the most representative of which is the feature selection algorithm.Based on the different of model training methods,traditional feature selection methods can be divided into three categories,namely,filtering,wrapping and embedded.The embedded method integrates the feature selection process with the training process,and can achieve the effect of automatically selecting feature in the optimization process.It has been proven by various studies to be superior to the filtering method and the wrapping method.However,although the existing feature selection method can reduce the learning dimension of the original knowledge data model to a certain extent,it is still difficult to meet the current situation of high-speed data size expansion.The reason is that the difficulty of dealing with high-dimensional data is not only due to the increase in the number and size of the samples in the data,but also the more redundant features,noise and outliers that come with growth.Therefore,this paper combines self-paced learning,low rank learning and spectral learning with the traditional embedded feature selection model proposing three more robust algorithms to overcome different issues existing in high-dimensional data.The main content of the paper is divided into the following parts:?1?Robust feature selection model based on self-paced learning.This part combines self-paced learning,sparse learning and feature self-representation techniques to propose a robust unsupervised feature reduction model,named Unsupervised Feature Selection based on Self-paced Learning?UFSSPL algorithm?.The algorithm uses feature self-representation to implement unsupervised learning and uses self-paced learning to solve the issue of ignored the difference between samples,which leads to the model being susceptible by outliers.Specifically,UFSSPL first automatically selects an important subset from samples to obtain the robust initial model of feature selection,and then gradually introduces the secondary sample to lift the generalization ability of model,and finally obtains a robust and generalized selection model.Through the clustering experiment evaluation,the UFSSPL algorithm has better effects on the real data set than other feature selection algorithms.?2?Spectral feature selection based on low-rank learning.This part combines low-rank learning,spectral graph learning,feature self-representation and sparse learning techniques to propose a low-rank spectral feature selection model,i.e.,Low-rank Unsupervised Feature Selection based on Self-representation?LFSR algorithm?.The algorithm combines low rank learning and spectral learning techniques to deal with the problem of limited effect of traditional unsupervised feature selection caused by cannot digging the internal structure of data?i.e.,global structure and local structure?.Specifically,a feature-level self-representation loss function plus a sparse regularization(?2,1-norm)is proposed to achieve unsupervised learning and feature selection.Then use low-rank learning and spectral learning to consider both the global structure and the local structure of the data to relieve the influence of redundancy and noise.The clustering experiment proves that the algorithm can achieve better results than comparison algorithms.?3?Spectral feature selection based on dynamic graph learning.This part integrates dynamic graph learning and sparse learning into regression model to propose a dynamic spectral feature selection model,i.e.,Dynamic Graph Learning for Spectral Feature Selection?DGSFS algorithm?.The algorithm is designed to the problem of traditional feature selection methods easily disturbed by the influence of redundancy and noisy from original feature space and separating from the progress of feature selection.Specifically,the algorithm firstly uses the supervised regression model and the group sparse?2,1-norm to implement the basic functional framework of feature selection,and adds the spectral learning theory to the established framework to dynamically mine the intrinsic local structure of the data from the low-dimensional subspace of the original data,and implement a one-step feature selection strategy.After classification experiments,the modified algorithm obtained better feature selection effects.In this paper,the robustness improvement is studied for the deficiency of traditional feature selection model,and the results are analyzed by using classification and clustering algorithms as experimental evaluation methods using different evaluation metrics.At the same time,in order to verify the correctness of the algorithm,all the algorithms in the paper are verified and analyzed strictly according to the unified experimental environment.The experimental results show that the new algorithm proposed in this paper is superior to the selected comparison algorithm in all indicators.In future work,I will consider applying these techniques directly to clustering,classification,or regression algorithms.
Keywords/Search Tags:Feature selection, Low-rank learning, Self-paced learning, Spectral graph learning, Feature-level self-representation, Sparse learning
PDF Full Text Request
Related items