Font Size: a A A

Research On Imbalanced Data Classification Methods Based On Resampling And Ensemble Learning

Posted on:2021-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:K H YangFull Text:PDF
GTID:2518306452464224Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The development of data collection and storage technology has led to the accumulation of huge amounts of data in many fields,such as the Internet,finance,engineering,and so on.Machine learning can efficiently find features and rules from the data,when combined with traditional data analysis technologies,it can help find useful knowledge in huge and complex data.The class-imbalance problem,as a difficulty in the field of machine learning,has great research significance and value.This paper addresses the problem of class imbalances,starting from the two levels of data and algorithms,considering the problems of training difficulty and mode collapse in the original generative adversarial network(GAN),the wasserstein generative adversarial network(WGAN)is used as a basic data synthesis method and combined with algorithms such as ensemble learning to improve the performance of imbalanced learning.Aiming at the influence of noise on imbalanced learning,a noise filter based on K nearest neighbors(KF)is designed to filter the noise existing in imbalanced data.WGAN is used for oversampling of the minority class in imbalanced data,and RUSBoost,an imbalanced learning algorithm based on Boosting framework,is used as an adversarial sample selector(ASS)to filter low-quality adversarial samples generated by WGAN.An ensemble learning algorithm based on sample sampling technology RUSBoost is used as an adversarial sample selector to filter low-quality adversarial samples generated by WGAN.The framework of KF-WGAN-ASS,which combines noise filtering,oversampling and adversarial sample s election is designed.The KF-WGAN-ASS algorithm is experimentally verified on six imbalanced data sets in the UCI database.The result show that the performance of the KF-WGANASS algorithm is better than other manual sampling methods.In addition,by the experiment adding noise to the original data set,the influence of noise filter parameters selection on algorithm performance is verified.Based on the resampling and Bagging algorithms,an imbalanced data classification method—WGAN-DSR-Bagging is proposed.WGAN is used for oversampling the minority class to improve reliability of the synthesized minority samples.Construct the training subset under differentiated sampling rate s(DSR)to improve the diversity of base classifier in the Bagging framework.An imbalanced data classification framework combining WGAN,differentiated sampling and Bagging is designed.This method has been experimentally verified on a customer's abnormal power consumption analysis data set.By comparing with WGAN oversampling,as Bagging and SMOTE-DSR-Bagging algorithms,it is concluded that WGAN-DSR-Bagging performs better than the above three algorithms on AUC,F-measure and G-mean measures.The experiments on the training set with different imbalance ratios show that the algorithm has better performance and stability than as Bagging method.
Keywords/Search Tags:class-imbalance, generative adversarial network, ensemble learning, K nearest neighbor, differentiated sampling rates
PDF Full Text Request
Related items