Font Size: a A A

Research On Mutual Information Based Feature Selection Algorithm

Posted on:2016-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y K GuoFull Text:PDF
GTID:2308330467980837Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Feature selection has become an increasingly important research direction in data mining. Mutual information based feature selection (MIFS) algorithms have gained increasing popularity, thanks to their ease of use, efficiency and strong theoretical foundation. Therefore, these algorithms are the hotspot in feature selection algorithms.In this paper, we first systematically introduce the basic theory of mutual information based feature selection, and it lays the foundation for deducing the evaluation function in new algorithm. Secondly, we review the MIFS both from searching strategy and evaluation function, and discuss their respective actor defect. It has the role of guiding designing new algorithm. Finally, two kinds of MIFS algorithms are discussed in detail, global optimal quadratic programming feature selection algorithm and heuristic maximal connected subgraph based feature selection. Then, we improve the two algorithms, and some experiments are designed for analyzing the performance of those algorithms.In this paper, we propose a new solution based on Rayleigh quotient of quadratic programming feature selection algorithm to solve its solution problem, named it as RFSCMI algorithm. We eventually get the rank of all features from this kind of feature selection algorithm. To be getting the best feature subset, we will set the number of features in the subset. We also propose a heuristic maximal connected subgraph based feature selection to solve the problem of excessive removing relevant features, named it as MCSGFS algorithm. This feature selection algorithm is irrelevant to the number of features in the best subset, because its output is the best feature subset without setting the number of features.In these experiments, we used four datasets and two classifications. Datasets are Ionosphere34, Waveform21, Waveform40and Wdbc31, respectively. Classifications are Naive Bayes and C4.5, respectively. Experiments show that the best feature subset from the improved algorithm, RFSCMI, is higher than QPFS and EQPFS in classified accuracy. And the best feature subset from the proposed algorithm, MCSGFS, is slightly higher than SOFS in classified accuracy. According to our comprehensive analysis on experiments of five MIFS algorithms, in most cases the result of global optimal quadratic programming feature selection algorithm is better than heuristic maximal connected subgraph based feature selection, and RFSCMI algorithm is the best in the five MIFS algorithms.
Keywords/Search Tags:Feature Selection, Mutual Information, Quadratic Programming, Maximal Connected Subgraph
PDF Full Text Request
Related items