Font Size: a A A

Research On Feature Selection Algorithms In Machine Learning

Posted on:2010-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:B N JiangFull Text:PDF
GTID:2178360275485993Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Feature selection has been the focus of research in machine learning. With the emergence of large-scale machine learning fields such as genome projects, text categorization and image retrieval, there is an urgent need for feature selection algorithms and machine learning algorithms which have better accuracy and efficiency. In recent years, studies have shown that many machine learning algorithms are adversely affected by both irrelevant and redundant features. And feature selection has been shown very effective in removing irrelevant and redundant features, increasing efficiency in learning tasks, improving learning performance like predictive accuracy, and enhancing comprehensibility of learned results.This thesis firstly reviews the basic knowledge of feature selection, and introduces two typical feature selection algorithms. Feature selection algorithms can broadly fall into the filter model or the wrapper model. The filter model runs fast and the wrapper model can give better results. In order to fully exploit the advantages of both, the thesis proposes a feature selection algorithm based on mutual information and genetic algorithms, that is, MI-GA algorithm. Experiments show that the algorithm has good comprehensive performance with respects to accuracy, size of feature subsets, and efficiency.Ensemble learning is also a hot research topic in machine learning. Both theoretical and empirical research has demonstrated that a good ensemble is one where the base classifiers in the ensemble are both accurate and tend to err in different parts of the instance space. One effective approach for generating an ensemble of accurate and diverse base classifiers is to use different feature subsets, or so-called ensemble feature selection. So the thesis proposes an ensemble learning algorithm based on cross-validation and ReliefF, applying feature selection to ensemble learning. Experiments show that the algorithm can effectively enhance the generalization performance of the ensemble. Feature selection is popular in supervised learning, but there is not much research in unsupervised learning. The thesis preliminarily summarizes unsupervised feature selection algorithms and introduces a typical unsupervised feature selection.In the end, the thesis summarizes the research and indicates the future work.
Keywords/Search Tags:Feature Selection, Mutual Information, Genetic Algorithms, Ensemble Learning, Unsupervised Learning
PDF Full Text Request
Related items