Font Size: a A A

An Adaptive Sampling Ensemble Classifier For Learning From Imbalanced Data Sets

Posted on:2011-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y DaFull Text:PDF
GTID:2178360305494394Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays every day data is available for extracting and analysis, the use of data mining techniques have shown great success in many real-world applications, classification models have been used widely on many applications as oil split detection, credit card detection, medical diagnosis, as well others.The overall objective of this research is to analyze a technique to increase the accuracy of classifiers built from imbalanced datasets, imbalanced datasets where one class is represented by a larger number of instances than other classes are common on data mining problems. Traditional machine learning is sensitive to these types of data and tends to value predominant classes and ignore the lower frequency cases.Models generated for database with imbalanced class tend to generate low accuracy for minority classes, in many cases this classes can be the most interest class.In order to cope with the imbalanced problem an ensemble-base algorithm is presented by creating new balanced training sets with all the minority class and under-sampling majority class.In each round, algorithm identified hard examples on majority class and generated synthetic examples for the next round. For each training set a Weak Learner as base classifier is used. Final predictions will be achieved by casting a majority vote.E-AdSampling is evaluated using 6 datasets from UCI repository, taking on consideration F-measures, G-mean, overall accuracy, and AUC (Area under ROC Curve) and compared with some known algorithms. Experimental results demonstrate the effectiveness of the proposed algorithm and good results on all the measures.
Keywords/Search Tags:data mining, ensemble algorithm, imbalanced data sets, synthetic samples
PDF Full Text Request
Related items