Font Size: a A A

Research On Unsupervised Feature Selection Via Feature Level Graph Learning

Posted on:2021-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:C H RenFull Text:PDF
GTID:2428330626955144Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data representation is one of the fundamental problems in machine learning,data mining and pattern recognition.With the rapid development of data acquisition and other related technologies,high-dimensional big data is widely used in many practical application scenarios.At the same time,some low quality features,such as noise data and outliers,are inevitably introduced with the data acquisition process.On the one hand,these high-dimensional data provides more details to better characterize the intrinsic structure of data,on the other hand,it also introduces higher data storage and computational costs and incurs great challenge the subsequent learning algorithms.In recent years,researchers have proposed a variety of methods to deal with high-dimensional data,among which representative technologies include dimension reduction and feature selection.Feature selection technology can be roughly divided into three categories: supervised,semi-supervised and unsupervised according to whether the ground truth is available.The unsupervised feature selection method does not rely on the real data label in feature selection,so it should be more practical.In recent years,researchers have proposed a large number of unsupervised feature selection algorithms,including filtering and embedded methods.Generally speaking,the filtering method has few hyper-parameters,simple algorithm implementation with less satisfactory performance;while the embedded method usually involves many hyperparameters,the algorithm process is more complex and achieves better results after careful parameter tuning.Although many methods have been proposed in the field of unsupervised feature selection in recent years,these algorithms still have some limitations:(1)the existing algorithms generally use vector based representation for feature selection.When there are low quality features in data,the fine-grained description of data neighborhood structure based on vector data representation makes the existing feature selection methods vulnerable to interference.(2)due to the lack of label information to guide feature search,the super parameter search and optimization of unsupervised feature selection algorithm is not feasible in practical application.To solve these two problems,two new unsupervised feature selection algorithms are proposed.It is worthwhile to point out that these two algorithms have been published in EI indexed conference and SCI indexed international journals.The main properties of these two algorithms are presented as follows:(1)A filtering unsupervised feature selection algorithm for LLE composition at feature level is proposed.In order to describe the internal structure of data based on all candidate features,the algorithm uses the local linear embedding method to construct the global neighborhood structure graph with all features as input.In order to describe the representation ability of a single feature,the algorithm takes a single feature as the input and also uses the method of local linear embedding to construct a neighborhood structure graph for a single feature.So far,the algorithm not only obtains the local neighborhood map based on all features,but also the local neighborhood map based on a single feature.The algorithm describes the ability of a single feature to represent the internal structure of the whole data by the difference between the adjacent structure graph based on the global feature and the adjacent structure graph of a single feature.In addition,the algorithm further improves the robustness and universality of the algorithm by the way of local linear weighting and the neighborhood structure graph of similar features.It is worth noting that this algorithm belongs to the filter unsupervised feature selection algorithm.The main difference between this algorithm and the existing filter method is that it uses the neighborhood structure graph constructed by the feature level LLE to represent the data structure instead of the traditional vector based data representation.The benchmark experiment results show that the performance of this algorithm is better than that of the mainstream filter feature selection algorithm.(2)An embedded unsupervised feature selection algorithm for feature level neighbor image reconstruction is proposed.This algorithm takes all candidate features and single features as input data,and uses k-nearest-neighbor algorithm to construct neighborhood structure graph based on all features and neighborhood structure graph based on single feature.The algorithm reconstructs the neighborhood structure graph based on all features by using the linear weighting method of the neighborhood structure graph of feature level,and uses the weight weight of linear weighting to describe the characterization ability of features.Similar to the previous method proposed in this paper,this algorithm will also use feature level composition to reflect the characterization ability of features.The algorithm adopts the feature weight learning method of linear reconstruction of feature graph,on the one hand,it depicts the correlation between a single feature neighborhood graph and the neighborhood graph based on all features,on the other hand,it also depicts the redundancy between features.Therefore,the algorithm is a balance of feature representation ability in correlation and redundancy.In addition,the algorithm can obtain the global optimal feature weight,and does not need to set the relevant super parameters,that is,the algorithm is an embedded unsupervised feature selection algorithm without super parameters.The experimental results on the benchmark data set show that this algorithm is superior to the mainstream unsupervised feature selection algorithm.In summary,this paper focuses on the basic problem of the data mining field of unsupervised feature selection.By using the learning strategy of constructing neighborhood graph for single feature to reflect the ability of single feature to depict the inner neighborhood structure of data,the corresponding filtering and embedded unsupervised feature selection algorithms are proposed successively.It is worth noting that the unsupervised feature selection algorithm based on the reconstruction of feature level adjacent graph proposed in this paper can obtain the global optimal feature weight on the basis of fully balancing the feature correlation and redundancy,and the entire algorithm does not need to set additional super parameters.In view of the above characteristics,the algorithm proposed in this paper will have certain application value in practice.
Keywords/Search Tags:Unsupervised Feature Selection, Local Graph Reconstruction, Parameter Free, Local Linear Embedding, Global Optimal
PDF Full Text Request
Related items