Font Size: a A A

Research On Adaptive Boosting Algorithm And Ensemble Classifier

Posted on:2019-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:L D WangFull Text:PDF
GTID:2428330548475985Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification is one of the tasks in data mining,which generate a prediction model by learning the labeled data to determine the category of the new data.Ensemble learning provides an effective and feasible solution for this problem.Its basic idea is to construct a number of different prediction models(base classifiers)first,and then combine the output of the base classifiers according to a certain strategy as the final output.In general,ensemble learning can achieve more stable output and better classification performance than a single classifier,and Boosting is a representative method in this field,which can construct a strong and accurate classifier according to certain rules by using a set of crude,less correct and simple primary prediction models.Initially,it was hard to be applied to practical problems.AdaBoost algorithm effectively solves this problem and has become the representative algorithm of Boosting family.It has drawn great attention.Diversity is the key factor affecting the generalization ability of ensemble learning.Therefore,this paper starts with the analysis of the variation of diversity among AdaBoost based classifiers.The performance of AdaBoost algorithm is improved from the perspective of diversity,and two optimization methods are given.Finally,an ensemble method based on feature selection is proposed to solve the unbalanced multi-classification problem by combining particle swarm optimization(PSO)with AdaBoost algorithm.Specific work as follows:First of all,aiming at the problem that how to measure diversity among weak classifiers created by AdaBoost as well as the adaptation problem of AdaBoost,an improved AdaBoost method based on the double-fault measure(DF)is proposed.It is based on the analysis and study of the relationship between the four diversity measures and the classification accuracy of AdaBoost.Firstly,the correlation between 4 different diversity measure and test error is studied and analyzed.Then,based on the experimental results obtained in the last step,we try to improve the selection strategy of the weak classifier of AdaBoost by using DF.Finally,the experimental results show that the improved AdaBoost algorithm can control over adaptation and further improve the performance of the classification.In addition,the accuracy and diversity of base classifiers are two important aspects that affect the generalization ability of ensemble learning.In order to ensure the accuracy,increase the diversity and further improve the generalization ability,we try to combine the clustering with AdaBoost algorithm.Firstly,the training samples are clustered,and the training samples are divided into multiple groups.Then,AdaBoost training is performed on each group to obtain a strong classifier,and strong classifiers are integrated according to the weighted voting strategy.Wherein,the weight of each classifier is adaptive,which is calculated based on the similarity between the test sample and each group and the classification confidence of the strong classifier on the test sample.Finally,compared with representative ensemble methods such as Bagging,Random Forest and AdaBoost on 10 data sets from UCI database,the results show that this method can achieve higher classification accuracy.Finally,aiming at the problem of imbalanced classification,we explore the effective wayto solve such problems by combining feature selection according to the characteristics that AdaBoost can learn the models that are good for high weight samples.Firstly,data preprocessing is performed.To remove the irrelevant and redundant features,we utilize the PSO algorithm to optimize the feature selection,and then the risk of minority class being treated as noise can be reduced.In order to shorten the evolution time of the PSO,when the population is initialized,an approximate optimal particle is generated according to the importance of the feature,so that the particle swarm begins to search in a more reasonable direction.Next,In order to increase the focus on the minority class and guarantee a high total accuracy,we perform AdaBoost algorithm which can learn models that favor high-weight samples.Finally,the validity of the method is verified by comparing experiments with other unbalanced learning algorithms on 7 unbalanced data sets.
Keywords/Search Tags:Ensemble Learning, AdaBoost, Diversity, Classification
PDF Full Text Request
Related items