The Algorithm Of Class Unbalance Ensemble Classifier Based On Sampling And Feature Transformation

Posted on:2019-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:H F Wu

Full Text:PDF

GTID:2428330545489815

Subject:Systems analysis and integration

Abstract/Summary:

PDF Full Text Request

Class-imbalanced problem also known as imbalanced class or rare class problem,is one of the most concerned topics in the field of pattern recognition and machine learning.For the two class problem,the characteristic of the class-imbalanced problem is that the number of instances of one class(the majority class)is obviously more than the number of the contrary one(the minority class).The widespread consensus within the industry is that the cost of incorrectly classifying the minority instance is significantly higher than the cost of incorrectly classifying the contrary class ones.However,traditional classification methods try to learn models with high accuary by assuming that the number of instances in each class is similar to each other,which often leads to afact that the minority class instances are often neglected,and misclassified into the majority class.Ensemble learning is one of the most commonly used methods to deal with class-imbalanced problems.The existing researches can be roughly divided into three categories: ensemble learning based on Bagging,ensemble learning based on Boosting and the hybrid method.The first two methods combine the sampling method with the Bagging and Boosting methods,making the learnt model concern more with the minority class instances.The third method combines the first two methods to get the advantages of both Bagging and Boosting advantages,at the same time,to enhance the classifier's performance on the imbalanced class.The key to the success of the ensemble learning is to build base classifiers with difference and accuracy.Different from the above methods,In this paper,an ensemble learning algorithm based on sampling and feature transformation is proposed.It ensures the differences between each base classifier as well as the accuary of each of them,so as to improve the performance of the model in the class imbalanced data set.This method iteratively learns each base classifier,the specific process is as follows: 1)under-sample the original data set to obtain a balanced data set in the balanced data set,further use the random sampling technique to obtain a new data set,and learn a transformation matrix in the new data set;2)mapping the balanced data into the new data space using the transformation matrix to obtain a new training set,and learn a base classifier.The first under-samping in step 1 ensures the better capture of the characteristics of the imbalanced class by the learned transformation matrix,the second samping in step 1 is used to increase the differences between feature transformations,and so as to ensure the differences between base classifiers;Step 2 uses balanced datasets to train base imbalanced class to enhance the generalization performance of the base classifier on the state-of-the art.The related experimental results show that compared with other methods,the proposed method show better generalization performance on accuracy,recall,g-mean,f-measure and AUC.

Keywords/Search Tags:

imbalanced learning, under-sampling, feature transformation, ensemble

PDF Full Text Request

Related items

1	Comprehensive Oversampling And Undersampling Study Of Imbalanced Data Sets
2	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
3	Hybrid Ensemble Learning For Imbalanced Data
4	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
5	A Non-sampling AdaBoost With Information Entropy For Imbalanced Learning
6	Classification In Imbalanced Data Based On Over-Sampling And Ensemble Learning
7	Imbalanced Data Classification Algorithm Based On Unsupervised Intelligent Under Sampling Method
8	Research On Unbalanced Learning Based On Sampling Method
9	An Analysis Of Combining Ensemble Sampling And Fea-Ture Selection Methods For Imbalanced Multi-Class Internet Traffic Classification
10	Imbalanced Data Classification And Its Application In The Prediction Of The Mobile Phone Replacement