An Unbalanced Data Study Based On The GAN Model

Posted on:2022-08-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2506306491960309

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

For the traditional classification model,we usually assume that the number of different categories in the data is balanced.However,in real life,for example,in the fields of health care,insurance,finance and so on,there are more and more unbalanced data,that is,the number of samples in some categories is far less than that in other categories,which makes the classification model tend to judge a few categories as many for high accuracy It is difficult to predict the number of classes.For example,in an extreme case,there is a binary data set with an imbalance rate of 99%,in which most classes account for 99% and a few classes account for 1%.In order to improve the accuracy,the classifier will divide all samples into most classes,thus only producing an error rate of 1%.If this happens in medical diagnosis,infectious genes are usually much less than non infectious genes,and the prediction model tends to judge the genes causing infection as non infectious,it will bring danger to people’s lives.This paper discusses how to deal with unbalanced data sets under supervision.The unbalanced data sets used are binary data sets.Firstly,the unbalanced data sets are divided into training set and test set,and then GAN is applied to the training set based on the existing minority samples,The model generates indistinguishable samples,so that the number of samples between the two classes is consistent.The two classes of samples are combined to get a new balanced data set,and the XGboost classifier is used for training and modeling.Finally,the model is tested on the original unbalanced data test set and the AUC value is recorded.At the same time,compared with the results of classical S MOT E method and clustering undersampling method,Our method has better performance in the result of AUC and improves the value in practical application.

Keywords/Search Tags:

Unbalanced data, GAN, Cluster sampling, XGboost

PDF Full Text Request

Related items

1	Research On Sampling Of Tax-checking
2	Research On Sampling Evidence In Criminal Procedure
3	Studies On Criterion Making--Sampling Of Ornament Plants
4	Research On Victims’ Sample Sampling In Internet Crime
5	Research On Forecasting The Risk Classification Of Prisoners
6	A Preliminary Study On The Method Of Sampling Forensics In Cases Of Telecommunication Network-related Fraud
7	Research On Foreign Object Detection And Recognition Technology Of Unbalanced Small Sample Based On Deep Learning
8	Spatio-temporal Analysis Of Arson And Identification Of Gang-related Arson Based On Data Mining
9	Design And Implementation Of A Lawyer Recommendation System Based On XGBoost
10	Criminal Compulsory Sampling And Research On The Procedural Control