Research On Feature Subspace Selection And Ensemble Optimization Of Decision Forest

Posted on:2010-07-15

Degree:Master

Type:Thesis

Country:China

Candidate:H B Li

Full Text:PDF

GTID:2178360332457858

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Ensemble Learning is a novel machine learning approach, which aims at improving the accuracy and generalization ability by learning multiple classifiers for the same problem. As an excellent ensemble learning method, decision forest is used widely in practice.When a high dimensional data set contains a large portion of features that are not informative to the classes of objects, the simple random sampling method will select many subspaces without informative features. As a consequence, the trees grown from these subspaces will suffer from a decrease of the average strength, which may affect the classification accuracy badly.In order to solve the problem, we improved the original decision forest by introducing a new subspace selection and model selection method. The main works of this paper are as follows:1. We gave a definition of the feature measure function for random forest. After surveying the current approaches to selecting the features, we presented a mathematical definition of the feature measure function in this paper. We evaluated four feature measure functions (i.e. Information Gain, Gain Ratio, Chi-square, mutual information) in experiments.2. This paper proposed two new subspace window methods based on feature counting and classification informantion cumulating in decision forest. We performed the improved algorithm on 12 data sets with four feature measure functions in the first part. The experimental results show that the improved algorithm can achive better classification performances than the original random forest.3. This paper also proposed a new model selection method, called bidirectional vote method. This method uses the In-Bag data set to build the decision tree, and then uses Out-of-Bag data set to evaluate and weight the corresponding tree. The bigger the weight is, the better the tree is. The trees that have higher weights will be chosen to vote for classification at last. This new decision forest based on bidirectional vote is performed on 12 data sets in experiments. The experimental results show that the improved decision forest based on bidirectional vote can achive better classification performances than the original random forest in general.

Keywords/Search Tags:

ensemble learning, decision forest, feature window, subspace, ensemble optimization

PDF Full Text Request

Related items

1	Research On Subspace Ensemble Learning
2	Symbiotic Forest:A Lightweight Decision Tree Ensemble Method
3	Feature Subspace Based Hybrid Clustering Ensemble Approach
4	Research On Classification Of Ecological Environment Sounds Based On Ensemble Subspace
5	Research On Software Defect Prediction Based On Ensemble Learning
6	Research On Ensemble Classification Methods With Differential Privacy
7	Ensemble Learning With An Archive-based Genetic Algorithm For Feature Subspace
8	Research On Classiifer Ensemble Based On Decision Tree
9	Research On Classifier-selection-based Ensemble Learning Algorithm
10	Feature Selection Based Ensemble Classification And Its Application