Research On Information-theoretic-based Supervised Feature Selection

Posted on:2022-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:C X Li

Full Text:PDF

GTID:2518306758491514

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

In the era of big data,data is constantly increasing in both quantity and dimension.One critical issue is that how to extract effective information in huge amount of data.Feature selection is one of the dimensionality reduction technologies,which can form an optimal feature subset by discarding unrelated features and redundancy features while retaining relevancy features.Feature selection methods can be divided into supervised feature selection,semi-supervised feature selection and unsupervised feature selection according to the presence or absence of labels in the data.Supervised feature selection methods use the label data to measure the classification ability of features.Since the information theory can effectively measure the relationship between variables,researchers applying information theory to feature selection methods can accurately describe the relationship between features or between feature and label.This paper mainly research the supervised feature selection methods based on information theory.Traditional feature selection methods use the incremental information of candidate features on class as evaluation criteria,that is,measure the correlation between candidate features and class when the selected features has been know.This kind of methods have a potential assumption that the influence of the selected features of class and candidate features are equal;actually,since the correlation between each selected feature and label is different,the impact on measuring the correlation between candidate feature and label is different.To solve the problem,this paper proposed a weight coefficient to measure the importance of selected features based on information theory.Then a new feature selection method is designed based on the weight coefficient--the weight of selected features based feature selection(SFWFS).In order to verify the effectiveness of this method,we have compared it with 5 methods on 16 Benchmark data sets.The experimental results showed that the new proposed method performs better than other methods in classification accuracy,AUC and F1 score.Existing feature selection methods intend to maximize the new classification information and minimum the redundancy information simultaneously,so,the objective functions are positively correlated with the amount of the new classification information and negatively correlated with the amount of redundant information.However,these feature selection methods ignore the negative correlation magnitude of redundant information,one problem that will cause,it may choose a feature that is completely irrelevant rather than the feature with large new classification information and high redundancy information.So,this paper proposed a new feature selection method--the weight of new classification information based feature selection method(WNCIFS),from another perspective to measure redundancy between features,and employ the weight of the new classification information(WNCI)to ensure features are related with class label when selected feature is given.To verify the classification accuracy of the new proposed method,we compared with five typical feature selection methods and the experiment result shows that the new method has clear advantage in classification accuracy.Finally,in order to further improve the performance of the feature selection method,we will continue to explore better ways to measure the correlation between features and label.

Keywords/Search Tags:

Feature Selection, Supervised, Information Theory, Selected Feature Weight, New Classification Information

PDF Full Text Request

Related items

1	Research On Feature Selection And Classification Algorithms Based On Information Theory
2	Research And Application On Rough Set Based Feature Selection Algorithm
3	Research On Information-theoretical-based Multi-label Feature Selection Approach
4	Research And Implementation Of Chinese Text Classification, Feature Selection Method,
5	Research On Feature Selection Method Based On Spectral Feature Analysis And Chi-Square Test
6	The Research Of Micro-blog Information Classification Method Based On Mixed Characteristics
7	Research On Feature Selection And Semi-Supervised Classification
8	Information Theory Based On Filters For Research Of Feature Selection
9	Research On Feature Selection Algorithms Under The Supervision Of Class Information
10	Feature Selection And Semi-supervised Classification For Imbalanced Data