An Adaptive Sampling Ensemble Classifier For Learning From Imbalanced Data Sets

Posted on:2011-04-16

Degree:Master

Type:Thesis

Country:China

Candidate:Y Da

Full Text:PDF

GTID:2178360305494394

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays every day data is available for extracting and analysis, the use of data mining techniques have shown great success in many real-world applications, classification models have been used widely on many applications as oil split detection, credit card detection, medical diagnosis, as well others.The overall objective of this research is to analyze a technique to increase the accuracy of classifiers built from imbalanced datasets, imbalanced datasets where one class is represented by a larger number of instances than other classes are common on data mining problems. Traditional machine learning is sensitive to these types of data and tends to value predominant classes and ignore the lower frequency cases.Models generated for database with imbalanced class tend to generate low accuracy for minority classes, in many cases this classes can be the most interest class.In order to cope with the imbalanced problem an ensemble-base algorithm is presented by creating new balanced training sets with all the minority class and under-sampling majority class.In each round, algorithm identified hard examples on majority class and generated synthetic examples for the next round. For each training set a Weak Learner as base classifier is used. Final predictions will be achieved by casting a majority vote.E-AdSampling is evaluated using 6 datasets from UCI repository, taking on consideration F-measures, G-mean, overall accuracy, and AUC (Area under ROC Curve) and compared with some known algorithms. Experimental results demonstrate the effectiveness of the proposed algorithm and good results on all the measures.

Keywords/Search Tags:

data mining, ensemble algorithm, imbalanced data sets, synthetic samples

PDF Full Text Request

Related items

1	Research On Imbalanced Data Classification Methods For Unsafe Samples
2	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
3	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
4	Research On Classification Algorithm For Imbalanced Data Sets Based On Support Vector Machines
5	The Classification Of Imbalanced Large Data Sets Based On Map Reduce
6	Research On Imbalanced Data Classification In Financial Field
7	Research On Ensemble Approach For Classification Of Imbalanced Data Sets
8	Research On The Classification Algorithm Of Imbalanced Data Sets
9	The Research Of Imbalanced Data Classification
10	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data