Font Size: a A A

Ensemble Learning And Ensemble Selection For Rare Class Problem

Posted on:2014-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2248330398978173Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Traditional classification methods are difficult to fit the sample set distribution of rare class due to these methods tend to minimize the training error. A common method to solve this problem is to build relative balanced training data sets, and then learn the accurate classification model. Most of the current works (e.g., EasyEnsemble)adopt sampling Based techniques to construct the balanced training data sets, such as SMOTE, over sampling technology, under sampling technology, etc.The same as the above methods, this paper also tries to construct a relative balanced training data sets. Unlike the traditional methods, this article attempts to do in-depth research to the rare class from the perspective of data partitioning combining with the methods of ensemble classification. The main contributions in this paper are as follows:(1) Propose a new ensemble learning method Based on partition called PBEL (partition-Based ensemble learning). PBEL employs clustering methods to partition majority class instances into more clusters and associates each cluster with a new class, and then PBEL mixes minority instances with the clusters’ instances to form relative balanced training sets to train ensemble classifier. For prediction, PBEL adopts majority voting method to classify instance label and maps the predicted label to majority class or minority class in order to improve the performance of ensemble classifiers on rare class data sets.(2) Apply ensemble selection method to the rare class classification problem. Before PBEL prediction, this article applies ensemble selection method to models builded by the PBEL in order to reduce the scale of PBEL and further improve its classification performance by select a group of optimal (or sub-optimal) subensembles from ensemble classifiers.Empirical results show that ensemble or sampling-Based ensemble can effectively improve the performance of imbalanced data sets classification on algorithm. The proposed ensemble selection approach of this article has much better performance on rare classification problem compared to other state-of-the-art methods.
Keywords/Search Tags:rare class, clustering methods, ensemble classifier, ensembleselection
PDF Full Text Request
Related items