Font Size: a A A

Research On Filter-based Unsupervised Feature Selection Algorithms

Posted on:2024-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:P HuangFull Text:PDF
GTID:1528307184465194Subject:Software engineering
Abstract/Summary:
As an important tool that can remove redundant and irrelevant features,filter-based unsupervised feature selection has proven its effectiveness in the field of machine learning,data mining,and bioinformatics.It can be divided into three types:(ⅰ)methods based on manifold learning;(ⅱ)methods based on feature analysis;(ⅲ)robust methods.This thesis proposes four novel methods for longstanding and undiscovered problems in filter-based unsupervised feature selection.The major contributions of this thesis are summarized as follows.(1)To deal with the problem about imbalanced neighbors and the problem that redundant features may not be removed,we propose a method based on t-power Adaptive Graph and Dependency Score(AGDS).The weights of t-power adaptive graph are the weights of the probability graph to the power of t.The dependency score consists of mutual information.AGDS can prevent problems about imbalanced neighbors and select relatively independent features.The experiments show that AGDS is efficient when dealing with large-scale datasets and can find out more important features.(2)To deal with the problem that the final similarity matrix with intrinsic property embedded differs too much from the initial matrix and the problem about high computational complexity,we propose two methods based on Controllable Adaptive Graph and discriminative feature learning(CAG-I,CAG-U).We first introduce a criterion to measure the difference between two matrices and then incorporate it into our models so that the difference between the initial and final matrices is under control.Moreover,the computational complexity of our methods is only O(dd’2)because of the discrete projection matrix,where d is the dimension of data,and d’ is the number of selected features.The experiments show that CAG-U and CAG-I can prevent problem of imbalanced neighbors and dynamically update similarity graph simultaneously.Besides,they are highly efficient when dealing with high-dimensional datasets.(3)To deal with the problem that the existing algorithm cannot guarantee that the similarity matrix will contain exact c connected component in all the cases and the problem that mutual information is not suitable in specific scenarios,we propose a method based on reliable Structured Graph and Uncorrelation Score(SGUS).The reliable structured graph is guaranteed to contain exact c connected components whenever our algorithm is converged.The uncorrelation score consists of variance and HSIC so that it works for both discrete and continuous datasets.The experiments show that SGUS is capable of discrete and continuous datasets,and it can capture more precise data relationship.(4)To deal with the problem that existing robust unsupervised feature selection methods are unsuitable for scenarios where outliers distributed randomly and concentratedly are widely present and the problem that the existing criterion for deciding a sample as an outlier is not intuitive.We propose a method based on bi-graph learning and sample weight learning.Bi-graph consists of a similarity graph and a dissimilarity graph.These two graphs can reflect the data’ s importance from different perspectives.Therefore,we introduce a metric for data importance based on these two matrices and incorporate it into our model so that different kinds of outliers will be absent from the learning process.The experiments show that RUFSDR is capable of the scenario where outliers distributed randomly and concentratedly are widely present.
Keywords/Search Tags:Unsupervised Feature Selection, Manifold Learning, Adaptive Graph, Structured Graph, Robustness, Feature Analysis
Related items