Font Size: a A A

The Research On Feature Selection Method Under Unsupervised Learning

Posted on:2021-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:D Q DingFull Text:PDF
GTID:1487306557455594Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of Internet technology and society,a large number of multi-dimensional and complex structure data have been produced in various scientific fields.A large amount of high-dimensional and unlabeled data makes data research and analysis face greater impact and challenges in the process of analysis and research.On the one hand,the high dimensionality of the data leads to more complex models for data analysis,and reduses the generalization ability of the model.On the other hand,the growth of labeled data lags far behind that of unlabeled data.In the face of such a complex data set,it is particularly important to remove the rough,save the essence,and delete the complicated ones.and it is very important to quickly and accurately select the index characteristics containing key information.Under the guidance of the lack of labeled data,the analysis of high-dimensional data under unsupervised conditions is more challenging.In this paper,we focus on unsupervised feature selection techniques.Specifically,there are two methods as follows:1)Data representation based model has been successfully deployed for unsupervised feature selection,which denes feature importance as the capability to represent original data via a reconstruction function.However,most existing algorithms conduct feature selection on original feature space which are easily influenced by the noisy and redundant features of original feature space.In this paper,the feature selection method is studied on the dictionary basis space of the data,Compared with the low-level representation in the original data space,this method can capture higher level and more abstract representation.In addition,a similarity graph is learned simultaneously to preserve the local geometrical data structure which has been confirmed critical for unsupervised feature selection.In short,we propose a model,named DGL-UFS,to joint dictionary learning,similarity graph learning and feature selection into a uniform framework.Experiments on various types of real world datasets demonstrate the effectiveness of the proposed framework DGL-UFS.2)Traditional unsupervised feature selection algorithms usually assume that the data instances are identically distributed and there are no dependency between them.However,the data instances are not only associated with the high dimensional features but also inherently interconnected with each other.The traditional similarity graph used in previous methods can only describe the pairwise relations of data,but it cannot capture the high-order relations,so that the complex structures implied in the data cannot be sufficiently exploited.In this work,we propose an unsupervised feature selection method which embeds the latent representation learning into feature selection.Instead of measuring the feature importance in original data space,the feature selection is carried out in the learned latent representation space which is more robust to noises.In order to capture the local manifold geometrical structure of original data in a high-order manner,a hypergraph is adaptively learned and embedded into the resultant model.In addition,an efficient alternating algorithm is developed to optimize the problem.Experimental results on eight benchmark data sets demonstrate the feasibility of the model and the effectiveness of the algorithm.According to the current situation of feature selection method research,and analyzing the functions and shortcomings of the existing unsupervised feature selection algorithm,This article has conducted in-depth research on unsupervised feature selection.Two effective unsupervised feature selection methods have been proposed based on the predecessors.The proposed methods excavate higher-level abstract representations of data features,build feature representation models in dictionary space and latent representation space,capture and retain local structural information between features in a higher-order way.The research in this paper is an effective exploration of unsupervised feature selection related issues,and proposes novel solutions for data research,which enriches the research content in this field.
Keywords/Search Tags:High dimensional data, Dictionary learning, Latent representation learning, Similarity graph learning, Hypergraph learning, Local structure preservation, Unsupervised feature selection
PDF Full Text Request
Related items