Font Size: a A A

Research On The Application Of Generative Adversarial Networks In Class Imbalance

Posted on:2021-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiuFull Text:PDF
GTID:2428330647961397Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of data generation and collection technology,huge amounts of data are being added to servers in various industries every day,which forces us to stride into the era of big data.Data classification algorithms are developing rapidly,but the traditional classification algorithms are not performing well in the data set with unbalanced categories.The category imbalance problem refers to the problem that the number of samples of a certain category in the data set is far more than or far less than that of other categories,which leads to the failure of the traditional classification model.In general,the algorithm used to solve the above problems is called class unbalanced learning algorithm.Category unbalanced learning has a wide range of applications,such as medical diagnosis,text classification,image recognition,network security,financial fraud detection,industrial detection,software testing,etc.Therefore,it is not only of theoretical significance,but also of extensive application value to carry out in-depth research on this technology.Oversampling algorithm is a popular research direction in the field of class imbalance.Since the invention of SMOTE,oversampling technique has entered the era of synthetic sample data from random oversampling,and many classic oversampling algorithms have been born in this direction.The proposed generative adversarial networks(GAN)proposed a new method of data generation from a new perspective,which is of positive significance to the development of oversampling algorithm.In this paper,the four classic over-sampling algorithms represented by SMOTE algorithm are used to balance the data set for 12 data with different degree of class imbalance.It is found that using the appropriate over-sampling algorithm can optimize the classification effect of the classifier for the unbalanced data.Then,GAN is applied to the problem of class imbalance to generate a small number of samples in the class imbalance data to achieve the purpose of balancing the data set,and an over-sampling algorithm GAN-OS is proposed.Using GAN-OS algorithm for the above 12 data sets,the oversampling effect is better,and the performance in some data sets is better than SMOTE algorithm.When the GAN-OS algorithm conducts over-sampling,it chooses the least square loss function to train and generate data.Compared with the cross entropy loss function used by the original GAN,it can greatly improve the stability of the model and provide feasible directions and positive supplementary Suggestions for improving the performance of the machine learning classifier on the problem of category imbalance.
Keywords/Search Tags:class imbalance, oversampling, SMOTE, GAN, least square loss function
PDF Full Text Request
Related items