| Mutual information or information gain is a good indicator of the relevance or dependence between variables and has been used in feature selection algorithm, which is called Mutual Information based Feature Selection algorithm(MI-FS). The performance of MI-FS algorithms depend on the accuracy of mutual information estimated from the finite observed data samples. However the common mutual information estimation procedures are biased. In this paper, we will introduce an improved plugin entropy estimator for discrete variables, Grassberger entropy estimates(GEE) and use it for entropy and mutual information estimation. In Decision Tree Induction area, it has been proved that GEE can lead to more accurate mutual information and more effective decision tree. By applying GEE based mutual information in the normalized mutual information selection algorithm(NMIFS), we improved NMIFS algorithm and named it G-NMIFS.We focus on G-NMIFS algorithm. To make a comparative analysis of the performance between G-NMIFS and NMIFS, we tested both algorithms on 6 different data sets including two-class and multi-class classification tasks: for each data set, we selected k(1≤k≤50) features with G-NMIFS and NMIFS algorithm and regarded k features as a feature subset, we trained them with RBF kernel based support vector machine(SVM), evaluated them with cross-validation classification accuracy rates and performed wilcoxon signed-rank test. The experimental results shows that: on all the six data sets, (1) G-NMIFS feature subset leads to higher classification accuracy rate than that of NMIFS for most of the k numbers; (2) G-NMIFS feature subsets lead to consistently higher classification accuracy rates than NMIFS for some consecutive numbers k; (3) G-NMIFS leads to the feature subsets corresponding to the optimum classification accuracy rate; (4) G-NMIFS statistically outperforms NMIFS. In a word, G-NMIFS outperforms NMIFS. On the basis, we will integrate the feature selection algorithms into the EEG based stress monitoring system. |