Font Size: a A A

Feature Selection Based On Feature Curve Of Subclass Problem

Posted on:2020-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2428330578471049Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Feature selection is a key step in data preprocessing and an effective dimensionality reduction method.At present,feature selection has been applied to many aspects such as text mining,image processing,and intrusion detection genomic analysis and so on.The feature selection method can distinguish and discard irrelevant and redundant features according to some criteria,and finally find out a subset of features to reduce the dimensionality of data,so as to make the learning algorithm more efficient and the result more accurateFeature selection methods fall into three classes:Filters,Wrappers,Embedded The Filters method uses a certain class of separability metrics to select the most distinguishable of features for categories from a feature set.Usually the Filters are more efficient,but the precision is average.The Wrapper method are feature selection combined with the learning algorithm process.The evaluation criteria of the feature subset are related to the performance of the learning algorithm.The Wrappers method tends to be more accurate but inefficient.The Embedded method uses some machine learning algorithms and models to train,obtains the weight coefficients of each feature,and selects features based on the coefficients from high to low.Similar to the Filters method,but it is trained to determine the pros and cons of the featureGenerally speaking,the Filter methods use one score to judge the comprehensive classification ability of features for all classes.The higher the score is,the stronger the classification ability is.However,studies in many literatures have indicated that only by selecting features with high scores often cannot achieve good effect.Therefore,this paper introduces a new feature selection method based on feature curve of subclass problem(Feature curve feature selection,referred to as FCFS),Traditional high score features are found by using separability metrics in different Filter methods,such as information gain,chi-square test.The feature curve is used to find the features with high discriminal ability for each class,and then obtain the optimal feature subset.In order to verify the validity of the FCFS method,Compared with 5 sorting feature algorithms CIFE,MRMR,ReliefF,JMI,DISR and 2 subset feature selection algorithms FCBF and CFS on the UCI dataset SRBCT,Arrhythmia,Urban,Dermatology,SCADI,Libras,Forest and Student.During the experiment,the same classifier and evaluation measures were used for comparison test.The results demonstrate that the proposed method FCFS is effective.
Keywords/Search Tags:Filter, Feature selection, Feature curve, Subproblem
PDF Full Text Request
Related items