Comparison Of Adaboost-based Learning Algorithms Of Classifiers

Posted on:2015-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:T Lu

Full Text:PDF

GTID:2268330425484731

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Pattern classification is one of the important fields in data mining. A classifier first learns the training samples with class labels, and then classifies the new samples without class labels. Since a single classifier is not high enough in classification accuracy, the combined classifiers may perform better.In this thesis, we mainly study a kind of AdaBoost algorithm for combined classifiers. In this algorithm, each sample is assigned with a weight, which is on behalf of the probability that the sample is selected to the training subset. During the iteration, if the samples are wrongly classified in the last time, then their weights will increase; otherwise their weights will decrease. In that way, the AdaBoost algorithm focuses on those samples that are difficult to be classified and thus improves the classification accuracies.This thesis uses the AdaBoost algorithm to classify the imbalanced data sets. We use four different classifiers as the base classifiers, which are the Fisher linear discriminant, pseudo-inverse linear classifier, Naive Bayes classifier and C4.5decision tree. In the experiments, we compare and analyze the four combined classifiers’ influence on the classification accuracy of the minority class and its impact on AUC performance of all samples. The experimental results have a guiding significance to practical application.In this thesis, the AdaBoost algorithm is improved. The main idea is that during the iteration, the size of a training subset will no longer be fixed. That is, for a sample, we multiply its weight by the size of the original training samples, and then round up the result. The result determines the times of the sample to be elected to the new training subset. On the one hand, the training subset contains all samples and no information is lost, so the classification performance will be improved. On the other hand, it avoids the situation that the training subset contains too much samples only from a certain class and contains too little or no samples from other classes, so it effectively avoids over-fitting and prejudice.

Keywords/Search Tags:

AdaBoost algorithm, Based classifier, Training subsets, Imbalanced data sets, Over-fitting, Prejudice

PDF Full Text Request

Related items

1	An Adaptive Sampling Ensemble Classifier For Learning From Imbalanced Data Sets
2	Research And Application Of AdaBoost Based On Fitting Weak Classifier
3	Research On Network Security Based On Imbalanced Data Classification
4	Research On Classification Technology For Imbalanced Data Sets
5	Classification Learning Of Imbalanced Data Sets Based On Sampling Processing
6	Text Classification Algorithm Based On Imbalanced Data Sets
7	Research On Classification Algorithm For Imbalanced Data Sets Based On Support Vector Machines
8	The Application Of Improved AdaBoost Algorithm Based On Cost Sensitive In Imbalanced Data
9	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
10	Imbalanced Data Classification Based On Multi-classifier Ensemble And Semi-supervised Learning