Font Size: a A A

Research And Application Of Imbalanced Data Classification Based On Oversampling And Ant Colony Optimization Resampling

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:A XiongFull Text:PDF
GTID:2491306473954799Subject:Power Engineering
Abstract/Summary:PDF Full Text Request
In order to solve the problem of unbalanced data classification,many preprocessing methods based on sampling have been proposed.The basic principle of these methods is to rebalance the data set through specific strategies.Aiming at the imbalanced classification problem,this paper proposes a novel ant colony optimization resampling(ACOR)algorithm.The ant colony optimization resampling algorithm mainly includes two steps:one is to balance the unbalanced data set through a specific oversampling algorithm;the other is to find a(sub)optimal subset from the balanced data set through ant colony optimization.Unlike other oversampling techniques,the ant colony optimization resampling algorithm does not pay attention to the mechanism of generating new samples.Its main advantage is that it can make full use of the existing oversampling algorithm and obtain an ideal training set through ant colony optimization.Therefore,the ant colony optimization resampling algorithm can improve the performance of the existing oversampling algorithm.The experimental results on 18 unbalanced data sets show that compared with the four commonly used oversampling methods(SMOTE,BSO,ROS,ADASYN),the ant colony optimization resampling algorithm is used in common performance evaluation standards(AUC,G-mean,BACC)has better classification results.The specific tasks are as follows:First of all,this article explains the current research status of imbalanced data classification methods and wastewater treatment fault diagnosis,and studies the faults and diagnosis methods of wastewater biochemical treatment,and introduces the performance evaluation standards and classification algorithms of imbalanced data classification.Then,a variety of oversampling algorithms are studied,and different oversampling algorithms and their respective advantages and disadvantages are introduced.For unbalanced data sets,this paper proposes an unbalanced sampling algorithm based on ant colony optimization.Simulation experiments show that the ant colony optimization resampling algorithm can enhance the classification performance of the oversampling algorithm.It is suitable for various oversampling algorithms and is a general framework.Finally,the ant colony optimization resampling algorithm is applied to the fault diagnosis of wastewater treatment,which is a typical imbalanced classification problem.Experiments results show that the ant colony optimization resampling algorithm can effectively enhance the classification performance of the data set,and help us distinguish the operating status of the wastewater treatment process.
Keywords/Search Tags:Machine learning, Imbalanced learning, Oversampling, Ant colony optimization resampling, fault diagnosis
PDF Full Text Request
Related items