Study On Feature Selection And Ensemble Learning Based On Feature Selection For High-Dimensional Datasets

Posted on:2005-06-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L X Zhang

Full Text:PDF

GTID:1118360152468081

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The emergence of high-dimensional machine learning fields such as image processing, information retrieval and bioinformatics pose severe challenges to the existing feature selection and machine learning algorithms. This dissertation mainly studies on feature selection and ensemble learning based on feature selection for high-dimensional datasets. Contributions in this dissertation mainly include:(1). Two two-phase combined feature selection algorithms are designed based on Relief evaluation algorithm. One is with filter-filter model, and the other is with filter-wrapper model. For the filter-filter model, in the first phase, Relief algorithm is used to filter the irrelevant features; in the second phase, correlation analysis is utilized to remove the redundant features. For the filter-wrapper model, the first phase is the same with filter-filter model, while in the second phase, backward sequential search algorithm is used to remove the redundant features with the performance of the induction algorithm to be used after feature selection used as evaluation for the feature subsets. Experiments on artificial and real datasets illuminate that the filter-wrapper combined model outperforms filter-filter model with respect to accuracy while is much slower than filter-filter model, and experiments on artificial datasets illuminate that filter-filter combined model can remove all or equal all redundant features.(2). Based on the merits and demerits of Relief and genetic algorithm in wrapper model, a coupling model of Relief and genetic algorithm is proposed, which uses the feature evaluation of Relief to instruct the initialization of genetic population, the coupling model aims to improve the efficiency of genetic algorithm which use the performance of the classifier as evaluation of feature subsets. Experiments on 17 relatively high-dimensional datasets show that, the algorithm has good comprehensive performance with respects to accuracy, size of feature subsets, and efficiency.(3). Considered about the accuracy of individual classifier and diversity among individual classifiers, this dissertation proposes an ensemble learning algorithm based on two-phase feature selection for high-dimensional datasets. Experiments validate that on high-dimensional datasets, accuracy of ReFeatEn is always higher or equally good as Bagging, Boosting and the random subspace ensemble algorithm RandFeatEn. The efficiency of ReFeatEn is much greater than Bagging and Boosting, and also can be run in parallel, so ReFeatEn is very fit for high-dimensional problems.(4). Propose the hypothesis of embedding feature selection into Boosting algorithm, and design a general algorithm structure. Accordingly corresponding ensemble learning algorithms for na?ve Bayesian classifier and nearest mean classifier are designed. Experiment results and analysis show that this novel coupling algorithm solve the problem that Boosting is sensitive to noise features and samples, and gain accuracy which is remarkably higher than the Boosting algorithm, and is robust and easy to be extended for other classifiers.

Keywords/Search Tags:

Feature selection, high-dimensional, ensemble learning, Relief, genetic algorithm

PDF Full Text Request

Related items

1	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
2	Study On Feature Selection Algorithm Based On Structured Data
3	Ensemble Learning With An Archive-based Genetic Algorithm For Feature Subspace
4	Research And Implementation Of Intrusion Detection System Based On Ensemble Learning
5	Relief-based Feature Selection Algorithms
6	Analysis Of High-throughput Gene Based On The Improved Relief And SVM Algorithm
7	The Research Of SVM-based Feature Selection And Its Ensemble Method
8	Research On Feature Selection And Its Stability For High-dimensional Data
9	Online Writeprint Identification Based On Ensemble Feature Selection
10	The Study And Application Of Feature Selection Algorithms Based On Relief