Font Size: a A A

Research On Feature Subspace Selection And Ensemble Optimization Of Decision Forest

Posted on:2010-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiFull Text:PDF
GTID:2178360332457858Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Ensemble Learning is a novel machine learning approach, which aims at improving the accuracy and generalization ability by learning multiple classifiers for the same problem. As an excellent ensemble learning method, decision forest is used widely in practice.When a high dimensional data set contains a large portion of features that are not informative to the classes of objects, the simple random sampling method will select many subspaces without informative features. As a consequence, the trees grown from these subspaces will suffer from a decrease of the average strength, which may affect the classification accuracy badly.In order to solve the problem, we improved the original decision forest by introducing a new subspace selection and model selection method. The main works of this paper are as follows:1. We gave a definition of the feature measure function for random forest. After surveying the current approaches to selecting the features, we presented a mathematical definition of the feature measure function in this paper. We evaluated four feature measure functions (i.e. Information Gain, Gain Ratio, Chi-square, mutual information) in experiments.2. This paper proposed two new subspace window methods based on feature counting and classification informantion cumulating in decision forest. We performed the improved algorithm on 12 data sets with four feature measure functions in the first part. The experimental results show that the improved algorithm can achive better classification performances than the original random forest.3. This paper also proposed a new model selection method, called bidirectional vote method. This method uses the In-Bag data set to build the decision tree, and then uses Out-of-Bag data set to evaluate and weight the corresponding tree. The bigger the weight is, the better the tree is. The trees that have higher weights will be chosen to vote for classification at last. This new decision forest based on bidirectional vote is performed on 12 data sets in experiments. The experimental results show that the improved decision forest based on bidirectional vote can achive better classification performances than the original random forest in general.
Keywords/Search Tags:ensemble learning, decision forest, feature window, subspace, ensemble optimization
PDF Full Text Request
Related items