Research On The Diversity In Ensemble Learning

Posted on:2015-02-18

Degree:Master

Type:Thesis

Country:China

Candidate:Z Qiao

Full Text:PDF

GTID:2268330428481797

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Ensemble learning is one of the top research issues in the field of machine learning, combining a number of base learners to solve the problems. Contrast to the common ma-chine learning algorithms which generate one learner from training data, ensemble learning try to construct a set of learners and combine them through various ways to use, in order to obtain the stronger generalization ability and better accuracy than a single learner. In this regard, there is a certain relationship between the accuracy of the ensemble classifier and the diversity among base classifier, so researching the relationship to improve the accuracy by enhancing diversity becomes critical. In this paper, RDT is used as the base classifiers, as a result of instability of RDT, the performance is improved obvious after the ensemble process. Meanwhile, RDT increases the diversity by its random property. We experiment with the Semi-supervised learning algorithm Tri-training and the new ensemble learning algorithm BLB (Bag of Little Bootstrap), comparing to the other ensemble learning algorithms. Then we will analyze the experiment results using the measures of diversity and accuracy. On this basis, we use the WeChat and the crowdsourcing strategy, perform the text classification ex-periments on the navigation corpus and get feedback of article artificial classification through the user participation. We utilize the different userâ€™s categories to generate diversity and achieve ensemble learning by through true crowdsourcing. In addition to this, we improve the two above-mentioned algorithms by researching the feedback information.(1)We try to use RDT as the base classifier and construct the ensemble classifier by the Semi-supervised learning algorithm Tri-training trained iteratively. The algorithm with the characteristics of semi-supervised learning generates three classifiers from the original labeled example set and then refined using unlabeled examples in the Tri-training process. Meanwhile the unla-beled examples may generate the diversity, which enhancing the differences between each base classifier. The experiment is performed on the UCI datasets that included10groups of different scales of small datasets and5Group medium-sized datasets and select the classi- cal ensemble learning algorithm bagging and Adaboost as the comparative experiment. The10-fold cross-validation method is used to obtain averaged test accuracy and the diversity measures such as DIS, DF, KW and MTI. We analyzed the relationship of them and con-cluded that the experiment showed that enhanced the diversity moderately can improve the accuracy of the ensemble classifier indeed.(2)The new ensemble learning algorithm BLB is used to research the diversity between the classifiers. It is an improved algorithm based on the Bootstrap and subsampling, the diversity is increased by varying the training set. Sim-ilarly, the experiment used RDT as the base classifier, is performed on the UCI datasets as same as the chapter3, then analyze the relationship between the diversity and the accuracy by the experiment results. The experiment showed that BLB can improve the accuracy on the majority of the datasets than bagging which is using bootstrap sample. It reflected the di-versity improve the accuracy from another perspective.(3)Finally, using the WeChat and the crowdsourcing strategy, compared to traditional manual classification which is high cost and difficult to obtain, using WeChat the crowdsourcing strategy to obtain artificial classification with the advantages like low cost and convenient acquisition. We select the misclassified ar-ticle in navigation corpus by the two above-mentioned algorithms and get feedback of the userâ€™s artificial classification. Meanwhile, it generated diversity by the different userâ€™s categories. The feedback of the users was analyzed to improve two before-mentioned al-gorithms.The results of experiment show that the accuracy of the improved algorithm has improved significantly.

Keywords/Search Tags:

Ensemble Learning, Diversity, Semi-supervised Learning, Crowdsourc-ing, Classification

PDF Full Text Request

Related items

1	Research On Several Algorithms And Theories In Diversity-Based Semi-Supervised Learning
2	Research On Semi-supervised Classification Algorithm Based On Integrated Neural Network
3	A Framework For Ensemble Learning Based Heterogeneous Extreme Learning Machines
4	Research On Image Classification Algorithm Based On Semi-supervised Learning
5	Semi-supervised Ensemble Learning For Hyperspectral Image Classification
6	Ensemble Based Semi-supervised Learning For Fault Classification
7	Semi-supervised Sentiment Classification Based On Ensemble Learning With Voting Combination
8	Research On Semi-supervised Learning Classification Algorithm Based On Mult-view
9	Semi-Supervised Ensemble For Classification Learning
10	Research And Implementation Of Semi-supervised Machine Learning Algorithms For Classifying The Imbalanced Protocol Flows