Font Size: a A A

The Improvement Of The Weighting Method In AdaBoost

Posted on:2011-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2178360305460240Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of data mining, classification has been one of the most important techniques. There are many classification techniques, such as Bayesian network, decision tree, etc., and these classifiers are often used as a single classifier. People have been tried to do a lot of researches to improve the classification ability, but currently the improvement of the single classifiers'performance has reached in a bottleneck. People have proposed the concept of ensemble learning under the principle of the weak learning and the strong learning. Ensemble is a technology that combining several different classifiers to one through some certain methods.The final classifier, which is generated through ensemble learning, that is, classifier ensemble, also known as classifier combination, is called the combination classifier. It is the combination of multiple classifiers of the instance of the classification system, where each classifier is called the base classifier. Experiments show that the combination of multiple classifiers can significantly improve the classifier performance. Therefore, its study has important theoretical value and practical significance.This paper gives a comprehensive description of the main research directions relating to ensemble learning, including the concept, causes, generation methods, composition, and meanings of ensemble learning, and introduces some study of weighted content, such as the weighted object and so on. Then it gives details of the boosting technology and the bagging technology of the ensemble learning.By the summary of research results existing, we can conclude that ensemble learning comprise two phases:the production of multiple predictive models and their combination. In this paper, two different improvements have been proposed based on these two stages, by improving their weighted method to improve further the algorithm's classification accuracy.Firstly, for the traditional AdaBoost algorithm, the weight of each base classifier is based on classification error rate of the training set obtained when received, so its weight is static for the test instances. If the probability of the class value, which the base classifier gives to the test instance, is taken into the weight value, the weight value of the base classifier is more fitted to the real situation of the test instance.Secondly, when base classifier is established by the traditional AdaBoost algorithm, the weights of the training set examples need to be constantly adjusted, based on the classification error rate of the base classifier on the training set. All instances with wrong class value are weighted by the same weight factor. However, when each instance is given a wrong class value, the possibility of the class value is different while it has been seen same. If we take the error probability of the class value into account, when the instances are weighted, the classifier established will have a higher accuracy classification.Finally, the new algorithms and the algorithm for comparing have been implemented in the Weka system. Experimental results of the two new algorithms have better accuracy than the general AdaBoost, and the algorithm performance has been improved.
Keywords/Search Tags:Data mining, Classification, Ensemble learning, Classifier combination, AdaBoost, Bagging, Weighting
PDF Full Text Request
Related items