Font Size: a A A

Research On Feature Selection Method Based On Spectral Feature Analysis And Chi-Square Test

Posted on:2020-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:C F ZhaoFull Text:PDF
GTID:2428330578971054Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Dimensional reduction is an important issue that must be faced when dealing with large,high?dimensional data.Feature selection is the selection of a subset of the original features on a large data set,preprocessing the data to obtain a smaller set of representative features.From the perspective of whether or not the class tag participates in feature selection,the feature selection method is divided into supervised feature selection,unsupervised feature selection and semi-supervised feature selection.Since the unsupervised feature selection method only considers the correlation between features and neglects the correlation between features and categories,the classification of features obtained by unsupervised feature selection is weak,such as spectral feature selection based on spectral theory.However,some supervisory feature selection methods only consider the correlation between features and classification categories,and can not take into account the redundancy between features.Many features in feature subsets are related to each other,which affects the independence of features and accuracy of classification.Therefore,this paper proposes a feature selection method based on spectral feature analysis and chi-square test.Spectral analysis was used to evaluate the correlation between features,and a chi-square test was used to evaluate the correlation between features and category tags.The chi-square test of the supervised learning part judges the correct rate of the theoretical value by the deviation between the observed value and the theoretical value,and the feature selection should give priority to the feature with higher chi-square value.The spectral clustering method of the unsupervised learning part first needs to calculate the similarity between each pair of sample points in a given sample data set to obtain a similarity matrix,and then construct an adjacency graph,and finally through the normalization of the graph.The normalized cut is used to obtain the evaluation criteria of the feature,and the feature selection is performed by this evaluation index.Feature selection method based on spectral feature analysis and chi-square test(SpeChi)combined with the characteristics of supervised learning and unsupervised learning.The chi-square test uses labeled data in the calculation process,and spectral feature analysis uses unlabeled data for feature selection.Feature selection is made to compensate for the lack of consideration of class correlation in spectral feature analysis.Finally,feature subsets with low redundancy between features and high correlation between features and categories are selected.The verification experiment uses 4 different classifiers and 8 public data sets.Compared with the other four feature selection methods,the algorithm shows that the algorithm improves the classification accuracy of feature sets and can obtain better classification results in advance..Finally,the influence of different parameter settings on the feature selection results in the SpeChi method is also studied here.Experiments show that the classification accuracy of setting parameters of 0.4,0.5;and 0.6 is better than the parameter of 0 or 1.When setting different parameter values,the experimental results are slightly different because the weight of the two correlations is different due to the feature selection.
Keywords/Search Tags:Feature selection, Spectrum theory, supervised learning, classification, Chi-square test
PDF Full Text Request
Related items