Font Size: a A A

LibD3C2.0:an Ensemble Classifier Based On Clustering And Its Parallel Implementation

Posted on:2015-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ChenFull Text:PDF
GTID:2268330428461177Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A specified learning algorithm’s generalization performance is one of the focuses of machine learning. Ensemble learning combines different classifiers into one single model to achieve higher generalization performance than individual member classifier. There are two sub-processes in ensemble learning. First, the generation of based classifiers, the second is the combination of individual weak classifier. The success of an ensemble system rests on the diversity of weak classifiers. An intuition for the generation of based classifiers is constructing a large number of classifiers which aims to achieve higher performance. However, too many propose stricter requirement on the ability of existing computing power and storage capacity. Zhou proposed the theory of selective integration, that is "Many Could Be Better Than All":shrinking the number of based classifiers by removing the redundancy one that has litter contribution to the improvement of ensemble system. Theoretical analysis and prominent experiments show that selective integration superior to Boosting and Bagging.In this thesis, we mainly concentrate on improving the diversity of weak classifiers at the generation stage and designing an effective ensemble strategy to achieve high generalization ability. The main contribution of our work is as follows:1. The generation of weak classifiers:Taking the distribution of the original data set into consideration, we propose a random subspace method to manipulate the original data. The manipulated data then is used to train individual classifier aims at leveraging diversity and performance of weak classifiers.2. The selection of based classifiers:Analyzing existing diversity measures and choosing the disagreement measure as our diversity measure. Apply clustering algorithms to remove the redundancy one and use the subset of classifiers to ensemble.3. The combination of based classifiers:Proposing a hybrid approach. This hybrid model is based on affine propagation clustering and the framework of dynamic selection and circulating in combination with a sequential search method. 4. In addition, we present a parallel framework of ensemble learning based on our methods mentioned above, named LibD3C2.0, to cope with the problem arise when there are a large number of based classifiers.
Keywords/Search Tags:Selective Ensemble Learning, Clustering, Dynamic Selection
PDF Full Text Request
Related items