Font Size: a A A

Comparative Study On Classification Methods Of Two High-Dimensional And Small Sample Data

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HuangFull Text:PDF
GTID:2428330590450621Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advancement of scientific research,the fields of graphic image processing,gene microarray and the like have produced increasingly complex high-dimensional and small sample data.Although larger dimension can describe the data comprehensively,the computational complexity will increase with the increase of the dimension,and the smaller sample size will lead to the over-fitting model and other issues,which is not conducive to classification identification.In order to explore how to effectively classify high-dimensional and small sample data sets,this paper collects two types of high-dimensional and small sample data sets of gene and face images,and preprocesses the data set,including data normalization and standardization.In view of the characteristics of high-dimensional and small samples of datasets,this paper uses feature selection based on Lasso Absolute Shrinkage and Selection Operator(Lasso)and feature selection based on feature correlation,combined with support vector machine(SVM)and K-Nearest Neighbor(KNN)classifiers to start experiments.Experiments show that when dealing with genetic data,classification based on Lasso feature selection is better,and classification based on integrated feature selection is more generalized and interpretable;when dealing with face image data,the classification method based on integrated feature selection is better than the classification based on Lasso feature selection,but neither of them achieves the ideal classification effect.Aiming at the problem that two kinds of feature selection based classification methods are not effective in processing face image datasets,the end-to-end convolutional neural network model is used to process image data,and the method has a better classification effect than the classification based on feature selection in both pixraw10 p and yale image data;at the same time,the multi-layer perceptron model is used to process the three kinds of gene data,and the classification effect obtained is worse than the classification method based on feature selection.In summary,for the classification of high-dimensional and small sample datasets,thenature of the dataset can be judged first by combining prior knowledge.For genetic data containing a large number of noise features,classification based on feature selection is prior;for face image data,Convolutional neural network classification can be used.In this paper,several different feature selection methods and classification methods are studied,experiments are carried out on various data sets and the advantages and disadvantages of various methods are summarized,which provides some guidance for dealing with high-dimensional and small sample data.
Keywords/Search Tags:High-dimensional and small sample data, Feature selection, Neural network
PDF Full Text Request
Related items