Research On Classification Algorithms For Unbalanced Data

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Xu

Full Text:PDF

GTID:2428330614463744

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In practical applications,the distribution of data labels is often unbalanced,and minority samples are the focus of our attention.Therefore,it is necessary to study the classification method of unbalanced data.For the classification of unbalanced data,the research is mainly from two aspects including data sampling and algorithm improvement;when the data distribution is extremely unbalanced,it can also be studied from the perspective of anomaly detection.This thesis mainly contains the following three working points(1)From the perspective of data sampling,resampling the data set before classification can often improve classification performance.Aiming at the problem that the SMOTE(Synthetic minority oversampling technique,SMOTE)algorithm does not consider the position of the newly generated few samples,an improved Safe-Level-SMOTE algorithm is introduced,and a TempC-SSMOTE oversampling method based on temporary markers is proposed.This can not only make the newly generated minority samples closer to the place where the minority samples are concentrated,but also reduce the oversampling scale and improve the problem that the oversampling algorithm easily generates noise samples.The experimental results show that evaluating from the perspective of F1 value,Recall value and G-mean value,the TempC-SSMOTE oversampling method based on temporary markers is superior to other common sampling methods,verifying the superiority and feasibility of the proposed method(2)From the perspective of classification algorithms,the ensemble learning method is an important method to deal with the classification task of unbalanced data sets.This chapter combines the CM AES(Covariance Matrix Adaptation Evolution Strategy,CMAES)algorithm with the ensemble learning method,and proposes an ensemble learning method based on the CMAES algorithm.This method uses the CMAES algorithm to adaptively train the composition weights of the base learner,thereby improving the classification performance.The experimental results show that evaluating from the perspective of F?value and Acc value,the ensemble learning method based on the CMAES algorithm can effectively integrate the base learner,and its performance is better than common ensemble learning methods(3)From the perspective of anomaly detection,common anomaly detection algorithms generally use anomaly evaluation values to determine whether a sample is an anomaly sample.This method often relies on the selection of thresholds and does not use label information.This paper proposes to combine the anomaly detection method with the classification model,use the anomaly detection idea to process the data features,mine the intrinsic information of the data,then use the greedy method to combine the newly generated data features,and finally combine with classifications.The experimental results show that evaluating from the perspective of F1value,Recall value and G-mean value,in extreme imbalance problems,the feature processing method based on the idea of anomaly detection can significantly improve the classification performance.

Keywords/Search Tags:

Data sampling, anomaly detection, unbalanced data, ensemble learning, classification

PDF Full Text Request

Related items

1	Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities
2	Classification And Application Of Ensemble Learning In Unbalanced Data
3	Research On Outlier Detection For Unbalanced Data
4	Research And Application Of Integrated Algorithms For Unbalanced Data Sets
5	Research On Imbalanced Data Classification Algorithms Based On Ensemble Learning
6	Application Of Ensemble Learning Based On Improved Mixed Sampling Method In Pre-lending Default Prediction
7	Research And Application Of Classification Algorithm Based On Unbalanced Data
8	Research On Cloud Platform Anomaly Detection Algorithm By Ensemble Learning
9	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
10	The Application Of Ensemble Classification On Unbalanced Data In Bank Marketing