Font Size: a A A

Research On Oversampling Ensemble Learning Algorithm For Unbalanced Classification

Posted on:2020-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Q QiFull Text:PDF
GTID:2428330590474194Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The unbalanced sample classification refers to the problem of pattern classification of data sets of a certain class sample far more than other classes.The focus is on the identification of the mionrity class sample.However,traditional classifiers tend to misclassify the minority class into majority class in order to pursue global accuracy,for this problem we propose a corresponding solution from the data level and algorithm level to improve the recognition accuracy of the minority class.Unbalanced sample classification method is mainly considered from the data level and algorithm level.At the data level,the oversampling algorithm can increase the sample information and help identify the minority class.Among them,the BorderLine oversampling algorithm and the Adaptive Smote oversampling algorithm have the problem of inaccurate identification of the minority sample in the boundary,which leads to inaccurate selection of seed minority sample,and it is difficult to synthesize data sets that conform to the sample distribution.At the algorithm level,the loss function of the existing classifier is modified to make the classifier pay more attention to the the minority class in order to improve the recognition accuracy of the minority sample.Among them,the sample's weight update of the cost-sensitive ADC2 algorithm only considers the influence of the base classifier accuracy on the update of the next round of sample weight,and does not consider the influence of the sample distribution on the sample weight update.The AdaBoost algorithm has a problem that the misclassification total weight of the minority class sample is smaller than that of the majority class misclassification,which makes the accuracy of classification of the minority lower.In this paper,for the shortcomings of the ensemble classifier and the oversampling method in dealing with the imbalance problem,the original algorithm are improved to raise the recognition accuracy of the minority class sample.Among them,for the problem of inaccurate calculation of sampling ratio,an improved weighted oversampling algorithm and a weighted voting of the majority class oversampling algorithms are proposed.The improved weighted oversampling algorithm performs corresponding proportional oversampling according to the extent of the distance of the minority sample to the boundary,and the minority instance's sampling proportion of the weighted voting of the majority class oversampling algorithm is determined by the weighted voting result of the majority class to the minority class,so that oversampling ratio of each minority sample is more accurate.For the problem of cost-sensitive algorithm ADC2 does not considerthe influence of sample distribution on weight update when weight adjustment,a dynamic weight adjustment factor integrated learning algorithm is proposed.This algorithm uses the ratio of the sum of the sample weight of the two classes as a factor to adjust the size of sample weight,so that the AdaBoost ensemble algorithm pays more attention to the minority class.Aiming at the problem that the AdaBoost algorithm does not have a large total weight of misclassification for the minority class,the BalanceBoost algorithm is proposed.By modifying the AdaBoost algorithm,each class has an equal sum of misclassification weight,so that each class is treated equally.We obtain the weighted oversampling BalanceBoost ensemble learning algorithm by combining the weighted oversampling algorithm and BalanceBoost algorithm,and apply the algorithm to the unbalanced sample classification problem.
Keywords/Search Tags:unbalanced classification, weighted voting oversampling, dynamic weight adjustment, ensemble learning, BalanceBoost
PDF Full Text Request
Related items