Font Size: a A A

Genetic Algorithm-based Mixed Feature Selection Methods Research

Posted on:2013-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2218330374961968Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Feature selection has been an active research area in pattern recognition, statistics, machine learning, and data mining communities, and widely applied to many fields such as text categorization, image retrieval, customer relationship management, intrusion detection, and genomic analysis. The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information. Feature selection can significantly improve the comprehensibility of the resulting classifier models and often build a model that generalizes better to unseen points.Firstly, this thesis reviews basic concepts and general procedure of feature selection and genetic algorithm. Based on different evaluation criteria, feature selection can fall into the filter model and the wrapper model. In order to fully exploit the advantages of both GA and the feature selection method based on normalized mutual information, a two-stage feature selection algorithm based on normalized mutual information and genetic algorithm is proposed. The feature selection method based on normalized mutual information is one of filter model. Firstly, it ranks features by normalized mutual information. Then it initializes the initial genetic algorithm population with good starting points that makes use of the front ranking features according to normalized mutual information. This algorithm includes an initialization procedure and adaptive crossover and mutation operators. Experimental results show that equal or better prediction accuracy can be achieved with a smaller feature set. These feature subsets can be obtained in a less time period.By removing redundant and irrelevant or noise features, feature selection can improve the predictive accuracy and the comprehensibility of the predictors or classifiers. Many feature selection algorithms with different selection criteria has been introduced by researchers. However, it is discovered that not any single criterion is best for all applications. In this paper, we propose a hybrid framework based on a genetic algorithm (GA) for feature subset selection that combines various existing feature selection methods. The advantages of this approach include the ability to accommodate multiple feature selection criteria and find small subsets of features that perform well for a particular inductive learning algorithm of interest to build the classifier. We conducted experiments using three existing feature selection methods and combined organically these methods with genetic algorithms. The experimental results demonstrate that our approach is an effective approach to find subsets of features with higher classification accuracy and/or smaller size compared to each individual feature selection algorithm.Ultimately, the full text of the work is summarized and some ideas for future work are discussed.
Keywords/Search Tags:feature selection, classification, mutual information, genetic algorithm, hybrid methods
PDF Full Text Request
Related items