Font Size: a A A

Research On Information-theoretic-based Supervised Feature Selection

Posted on:2022-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:C X LiFull Text:PDF
GTID:2518306758491514Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In the era of big data,data is constantly increasing in both quantity and dimension.One critical issue is that how to extract effective information in huge amount of data.Feature selection is one of the dimensionality reduction technologies,which can form an optimal feature subset by discarding unrelated features and redundancy features while retaining relevancy features.Feature selection methods can be divided into supervised feature selection,semi-supervised feature selection and unsupervised feature selection according to the presence or absence of labels in the data.Supervised feature selection methods use the label data to measure the classification ability of features.Since the information theory can effectively measure the relationship between variables,researchers applying information theory to feature selection methods can accurately describe the relationship between features or between feature and label.This paper mainly research the supervised feature selection methods based on information theory.Traditional feature selection methods use the incremental information of candidate features on class as evaluation criteria,that is,measure the correlation between candidate features and class when the selected features has been know.This kind of methods have a potential assumption that the influence of the selected features of class and candidate features are equal;actually,since the correlation between each selected feature and label is different,the impact on measuring the correlation between candidate feature and label is different.To solve the problem,this paper proposed a weight coefficient to measure the importance of selected features based on information theory.Then a new feature selection method is designed based on the weight coefficient--the weight of selected features based feature selection(SFWFS).In order to verify the effectiveness of this method,we have compared it with 5 methods on 16 Benchmark data sets.The experimental results showed that the new proposed method performs better than other methods in classification accuracy,AUC and F1 score.Existing feature selection methods intend to maximize the new classification information and minimum the redundancy information simultaneously,so,the objective functions are positively correlated with the amount of the new classification information and negatively correlated with the amount of redundant information.However,these feature selection methods ignore the negative correlation magnitude of redundant information,one problem that will cause,it may choose a feature that is completely irrelevant rather than the feature with large new classification information and high redundancy information.So,this paper proposed a new feature selection method--the weight of new classification information based feature selection method(WNCIFS),from another perspective to measure redundancy between features,and employ the weight of the new classification information(WNCI)to ensure features are related with class label when selected feature is given.To verify the classification accuracy of the new proposed method,we compared with five typical feature selection methods and the experiment result shows that the new method has clear advantage in classification accuracy.Finally,in order to further improve the performance of the feature selection method,we will continue to explore better ways to measure the correlation between features and label.
Keywords/Search Tags:Feature Selection, Supervised, Information Theory, Selected Feature Weight, New Classification Information
PDF Full Text Request
Related items