Font Size: a A A

The Research Of Ensemble Pruning Method For Imbalanced Data

Posted on:2015-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2298330431496181Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Class-imbalance data is closely related to daily life. It is of great importancesignificance to classify these data correctly,so imbalanced data classification is a hotresearch topic in the field of data mining. However, traditional state-of-the-artclassifiers and ensemble do not work well on data sets for imbalanced classdistribution. Meanwhile, the ensemble not only occupies too much memory, but alsosignificantly increases the response time of prediction. The style of ensemble pruningalgorithm is widely adopted to solve the above problems in the ensemble learning.Common ensemble pruning algorithm use training set as a pruning set and tend tochoose those base classifiers which are more favorable for negative class instance. Sothis algorithm should not be applied to prune imbalanced data orienting to ensembleclassifier. Only a few people research the style of ensemble pruning algorithm thatbased on imbalanced data. This paper researches on the pruning sets of imbalanceddata and creates the ensemble classifiers based on imbalance data.First, we propose two new algorithm called EPPS(Ensemble Pruning based onpruning set of SMOTE) and EPPU(Ensemble Pruning based on pruning set ofUnder-sampling), which are based on the SMOTE(the synthetic minorityover-sampling technique) and simple random sampling techniques, respectively.These two algorithms use SMOTE and simple random sampling techniques to createrelative balance pruning set and to the pruning process of supervised ensemble. Thesetwo methods can significantly improve the classification performance of imbalanceddata orienting to ensemble. Then we propose an algorithm called EPPE (EnsemblePruning based on Positive Examples), which views the positive class instances and itssided negative class instances as a pruning set, then choose the base classifiers whichperforms better on positive class and its sided negative class in the pool. This methodcan create an ensemble which generalize better for imbalanced instance set. Theexperimental results show that EPPS, EPPU and EPPE use smaller pruning set tocreate ensemble which has better classification performance compared with EasyEnsemble, Bagging and C4.5algorithms. Especially, EPPE performssignificantly better than other classification algorithms in most instance sets, andthree types of ensemble pruning algorithm can significantly reduce the scale ofensemble.
Keywords/Search Tags:class-imbalance, ensemble pruning, pruning set, KNN
PDF Full Text Request
Related items