Font Size: a A A

Research On Feature Selection Algorithms Of High-dimensional And Small-sample Size Based On Neighborhood Consistency

Posted on:2021-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZengFull Text:PDF
GTID:2428330629480599Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the booming development of big data technology in many domains,such as semantic analysis,image recognition and gene selection.In specific application scenarios,the data information of these fields includes the characteristics of high-dimensional and small-sample size,that is,the feature space is high dimensionality and the sample size is too small.There are some problems for the high-dimensional and small-sample size data,such as the inconsistency between the dimension of features and the size of samples,and the skew of the categorical distribution.Based on the essential characteristic of application driving,the classification learning of high-dimensional and small-sample size faces the calculation of inefficient,low prediction accuracy,inability to identify small samples,as well as the model overfitting,poor stability,and large storage overhead.To fully explore the application value of high-dimensional and small-sample size data,the knowledge discovery from high-dimensional and small-sample size data has gradually become a hot research topic.Feature selection refers to delete irrelevant features,noise features,or redundant features,which can reduce the dimension of the feature space.This paper takes high-dimensional and small-sample size data as the research object,focuses on different application requirements in real scenarios around the existing challenging problems of feature selection of high-dimensional and small-sample size data,and conducts a research on the feature selection algorithm of high-dimensional and small-sample size data under supervised learning mode.Main research include:(1)A feature selection algorithm for high-dimensional and small-sample size data using feature perturbation is proposed to solve the disharmony between high-dimensional feature and small-sample size.Firstly,the feature perturbation strategy is used to define the datum feature and the datum feature space,and construct multiple feature subspaces with differences.Secondly,a subspace learning algorithm based on feature perturbation is proposed.Finally,eight data sets are selected for comparative analysis with seven algorithms,and the experimental results show the effectiveness of the proposed algorithm.(2)A feature selection algorithm of high-dimensional and class-imbalanced data based on consistency analysis is proposed to solve the problem of class distribution imbalance.First,the consistency between sample distribution and label is defined via fusing class information.Secondly,a forward greedy feature selection algorithm based on feature importance is designed.Finally,the experimental results of eight feature selection algorithms on a dozen of data sets show that the proposed algorithm can significantly improve the prediction accuracy.
Keywords/Search Tags:feature selection, neighborhood consistency, high dimensional and small sample size, class imbalance
PDF Full Text Request
Related items