Font Size: a A A

Comparison Of Adaboost-based Learning Algorithms Of Classifiers

Posted on:2015-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:T LuFull Text:PDF
GTID:2268330425484731Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Pattern classification is one of the important fields in data mining. A classifier first learns the training samples with class labels, and then classifies the new samples without class labels. Since a single classifier is not high enough in classification accuracy, the combined classifiers may perform better.In this thesis, we mainly study a kind of AdaBoost algorithm for combined classifiers. In this algorithm, each sample is assigned with a weight, which is on behalf of the probability that the sample is selected to the training subset. During the iteration, if the samples are wrongly classified in the last time, then their weights will increase; otherwise their weights will decrease. In that way, the AdaBoost algorithm focuses on those samples that are difficult to be classified and thus improves the classification accuracies.This thesis uses the AdaBoost algorithm to classify the imbalanced data sets. We use four different classifiers as the base classifiers, which are the Fisher linear discriminant, pseudo-inverse linear classifier, Naive Bayes classifier and C4.5decision tree. In the experiments, we compare and analyze the four combined classifiers’ influence on the classification accuracy of the minority class and its impact on AUC performance of all samples. The experimental results have a guiding significance to practical application.In this thesis, the AdaBoost algorithm is improved. The main idea is that during the iteration, the size of a training subset will no longer be fixed. That is, for a sample, we multiply its weight by the size of the original training samples, and then round up the result. The result determines the times of the sample to be elected to the new training subset. On the one hand, the training subset contains all samples and no information is lost, so the classification performance will be improved. On the other hand, it avoids the situation that the training subset contains too much samples only from a certain class and contains too little or no samples from other classes, so it effectively avoids over-fitting and prejudice.
Keywords/Search Tags:AdaBoost algorithm, Based classifier, Training subsets, Imbalanced data sets, Over-fitting, Prejudice
PDF Full Text Request
Related items