Research On Feature Selection Algorithms Of High-dimensional And Small-sample Size Based On Neighborhood Consistency

Posted on:2021-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:H L Zeng

Full Text:PDF

GTID:2428330629480599

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the booming development of big data technology in many domains,such as semantic analysis,image recognition and gene selection.In specific application scenarios,the data information of these fields includes the characteristics of high-dimensional and small-sample size,that is,the feature space is high dimensionality and the sample size is too small.There are some problems for the high-dimensional and small-sample size data,such as the inconsistency between the dimension of features and the size of samples,and the skew of the categorical distribution.Based on the essential characteristic of application driving,the classification learning of high-dimensional and small-sample size faces the calculation of inefficient,low prediction accuracy,inability to identify small samples,as well as the model overfitting,poor stability,and large storage overhead.To fully explore the application value of high-dimensional and small-sample size data,the knowledge discovery from high-dimensional and small-sample size data has gradually become a hot research topic.Feature selection refers to delete irrelevant features,noise features,or redundant features,which can reduce the dimension of the feature space.This paper takes high-dimensional and small-sample size data as the research object,focuses on different application requirements in real scenarios around the existing challenging problems of feature selection of high-dimensional and small-sample size data,and conducts a research on the feature selection algorithm of high-dimensional and small-sample size data under supervised learning mode.Main research include:(1)A feature selection algorithm for high-dimensional and small-sample size data using feature perturbation is proposed to solve the disharmony between high-dimensional feature and small-sample size.Firstly,the feature perturbation strategy is used to define the datum feature and the datum feature space,and construct multiple feature subspaces with differences.Secondly,a subspace learning algorithm based on feature perturbation is proposed.Finally,eight data sets are selected for comparative analysis with seven algorithms,and the experimental results show the effectiveness of the proposed algorithm.(2)A feature selection algorithm of high-dimensional and class-imbalanced data based on consistency analysis is proposed to solve the problem of class distribution imbalance.First,the consistency between sample distribution and label is defined via fusing class information.Secondly,a forward greedy feature selection algorithm based on feature importance is designed.Finally,the experimental results of eight feature selection algorithms on a dozen of data sets show that the proposed algorithm can significantly improve the prediction accuracy.

Keywords/Search Tags:

feature selection, neighborhood consistency, high dimensional and small sample size, class imbalance

PDF Full Text Request

Related items

1	Research On Feature Selection And Stability Analysis For High Dimensionality Small Sample Size Data
2	Combating the class imbalance problem in small sample data sets
3	Online Streaming Feature Selection Algorithms Of High-dimension And Class-imbalanced Data
4	Research On Feature Selection Algorithms Of High-dimensional Samples Based On Data Characteristics
5	Research On The Stability Of Feature Selection For High-dimensional Small Sample Data
6	Comparative Study On Classification Methods Of Two High-Dimensional And Small Sample Data
7	Algorithms Research On Feature Extraction And Classifiers Of High-Dimensional And Small Sample Size Data
8	Research On Online Streaming Feature Selection Algorithms
9	Classification And Feature Selection On High-dimensional And Small-sampling Data
10	Relationships Between Evaluation Criteria Of Feature Selection And Analysis On Class Imbalance Problem Over Vhr Remote Sensing Imagery