Font Size: a A A

Imbalanced Learning Based On Undersampling Technique And Rotation Forest

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y DiaoFull Text:PDF
GTID:2370330626453669Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most challenging and attractive issues in pattern recognition and machine learning,the imbalanced problem has attracted increasing attention.It is characterized by a fact that severe imbalance exists in the distribution of examples.In many practical problems,such as oil spill detection,the frequency of incidents is usually low,but the consequences of these rare events are unimaginable once they occur.Therefore,how to effectively identify the minority samples we are interested in is often more important than the correct classification of the majority samples.Undersampling-based ensemble learning method is an effective method which is commonly used to deal with imbalance problems,but the datasets obtained by using under-sampling method is often small.How to learn the base classifier with high accuracy on fewer datasets is one of the core problems of this method.It is observed that rotation forest has higher generalization performance than other ensemble methods such as bagging,boosting and other methods,so this paper intends to choose rotation forest as the base learner in order to obtain a base classifier with high accuracy.In addition,rotation forest is more sensitive to the sampling technique than some robust methods including SVM and neural networks;thus,it is easier to create individual classifiers with diversity using rotation forest.This is another reason why the ensemble learning method based on undersampling can succeed in imbalanced problems.Therefore,this paper intends to study the effective combination of undersampling technology and rotation forest method,and propose an effective method for dealing with imbalance problems.The main work of this paper is as follows:(1)Ensemble with Undersampling technique and Rotation forest(EUR)is proposed.Which including two versions of the improved undersampling-based ensemble methods are implemented,EUR-I and EUR-II.EUR-I firstly undersamples subsets from the majority class and then learns each classifier using rotation forest on the data obtained by combining each subset with the minority class,then useing rotation forest to learn each classifier on the new dataset.Therefore,EUR-I is an ensemble of ensembles.EUR-II is similar to EUR-I,with the exception of removing the examples of the majority class that are correctly classified with high confidence after learning each classifier for further con-sideration.(2)Improving Rotation forest with Undersampling technique(IRU)is proposed.Based on the EUR method,this method further advances the undersampling technology to the learning process of rotation forest.Specifically,this method samples subsets from the majority class,learns a projection matrix from each subset and obtains new training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set.The experimental results on the KEEL data sets show that compared with the traditional undersampling-based ensemble learning method,the proposed algorithm shows significant advantages on evaluation measures of recall,g-mean,f-measure and AUC.
Keywords/Search Tags:class imbalance, resampling, rotation forest, ensemble learning
PDF Full Text Request
Related items