Font Size: a A A

Research On Classification Algorithms For Unbalanced Data

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:X Y XuFull Text:PDF
GTID:2428330614463744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In practical applications,the distribution of data labels is often unbalanced,and minority samples are the focus of our attention.Therefore,it is necessary to study the classification method of unbalanced data.For the classification of unbalanced data,the research is mainly from two aspects including data sampling and algorithm improvement;when the data distribution is extremely unbalanced,it can also be studied from the perspective of anomaly detection.This thesis mainly contains the following three working points(1)From the perspective of data sampling,resampling the data set before classification can often improve classification performance.Aiming at the problem that the SMOTE(Synthetic minority oversampling technique,SMOTE)algorithm does not consider the position of the newly generated few samples,an improved Safe-Level-SMOTE algorithm is introduced,and a TempC-SSMOTE oversampling method based on temporary markers is proposed.This can not only make the newly generated minority samples closer to the place where the minority samples are concentrated,but also reduce the oversampling scale and improve the problem that the oversampling algorithm easily generates noise samples.The experimental results show that evaluating from the perspective of F1 value,Recall value and G-mean value,the TempC-SSMOTE oversampling method based on temporary markers is superior to other common sampling methods,verifying the superiority and feasibility of the proposed method(2)From the perspective of classification algorithms,the ensemble learning method is an important method to deal with the classification task of unbalanced data sets.This chapter combines the CM AES(Covariance Matrix Adaptation Evolution Strategy,CMAES)algorithm with the ensemble learning method,and proposes an ensemble learning method based on the CMAES algorithm.This method uses the CMAES algorithm to adaptively train the composition weights of the base learner,thereby improving the classification performance.The experimental results show that evaluating from the perspective of F?value and Acc value,the ensemble learning method based on the CMAES algorithm can effectively integrate the base learner,and its performance is better than common ensemble learning methods(3)From the perspective of anomaly detection,common anomaly detection algorithms generally use anomaly evaluation values to determine whether a sample is an anomaly sample.This method often relies on the selection of thresholds and does not use label information.This paper proposes to combine the anomaly detection method with the classification model,use the anomaly detection idea to process the data features,mine the intrinsic information of the data,then use the greedy method to combine the newly generated data features,and finally combine with classifications.The experimental results show that evaluating from the perspective of F1value,Recall value and G-mean value,in extreme imbalance problems,the feature processing method based on the idea of anomaly detection can significantly improve the classification performance.
Keywords/Search Tags:Data sampling, anomaly detection, unbalanced data, ensemble learning, classification
PDF Full Text Request
Related items