Font Size: a A A

Information Theory Based On Filters For Research Of Feature Selection

Posted on:2019-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2428330548456876Subject:Engineering
Abstract/Summary:PDF Full Text Request
In an era which the dimension and complexity of data sets are continuously growing,it is essential to extract useful information from huge amounts of meaningless data.Therefore,it is very important to extract key features for high-dimensional data.Feature selection techniques are used to select the features with high discriminative power while discarding those irrelevant or redundant ones.It is helpful to improve the accuracy of classification algorithm,reduce computational complexity and reduce the time of classification training.There are numerous feature selection methods which have been categorized into three groups: Filter models,Wrapper models,and Embedded models.The information-based feature selection algorithm based on Filter has been a hot spot of research.Traditional feature selection methods based on information theory only consider the correlation between candidate features and class labels,and ignore the selected features.In fact,with the increasing number of selected features,the relevance between candidate features and class labels is dynamically changed.To solve this problem,a new definition of relevance — conditional relevance,called CMI is presented in this paper.That is,we give a new definition of the relevance between candidate features and class labels when each selected feature is given.Consequently,a novel conditional relevance feature selection algorithm CRFS is proposed also.Firstly,the superiority of the conditional relevance is verified in theory.In order to assess the effectiveness of the proposed method,several experiments are performed on 10 real-world datasets via two different classifiers(3NN and SVM).Results show,compare with the 7 typical feature selection algorithms,the new algorithm CRFS can provide better classification performance.Besides,in the process of feature selection,we need to consider the correlation between candidate features and the class labels under selected feature conditions and the correlation between selected features and class labels under the influence of candidate features,and the trade-off between the two correlations.To fill this gap,a new definition of relevance — weighted relevance WR and a feature selection method named WRFS are proposed together.Two weight coefficients are introduced in WRFS,which use mutual information and joint mutual information to balance the importance between the two kinds of feature relevancy terms.To evaluate the classification performance of our method,WRFS is compared with seven feature selection algorithms by two different classifiers on 10 benchmark data sets.Experimental results indicate that WRFS outperforms the other baselines in terms of the classification accuracy,AUC and F1 score.Regarding feature relevance,CRFS mainly considers the correlation between candidate features and class labels given selected features,while WRFS focuses on the dynamic trade-off between the candidate features and the selected features respectively.In terms of feature redundancy,CRFS adopts class-independent redundancy,with WRFS considering class-dependent and class-independent redundancy and both methods achieve excellent experimental results.
Keywords/Search Tags:Machine Learning, Information Theory, Feature Selection, Conditional Relevance, Weighted Relevancy
PDF Full Text Request
Related items