Font Size: a A A

Research On Classification Of Imbalanced Dataset Based On Generative Adversarial Networks

Posted on:2020-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X B XieFull Text:PDF
GTID:2428330590495564Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,with the rise of artificial intelligence,in-depth learning is one of the main research fields of artificial intelligence.Deep learning is a data-driven learning method,which requires high quantity and quality of data.In applications,many data requirements can be met,but in some areas,such as financial risk,fault detection and so on,the amount of normal data and abnormal data is very unequal.In data sets,the datasets with extremely imbalanced proportions among different types of samples are called imbalanced datasets.Due to the different information content of different types of samples in imbalanced datasets and the different importance in training process,it is difficult to classify imbalanced datasets using traditional classifiers,and it is difficult to evaluate classifiers correctly by common evaluation.In this paper,a method of classifying imbalanced datasets based on Generative Adversarial Network(GAN)is proposed.GAN consists of a generator and a discriminator.The function of generator is to fit the distribution of input real data as much as possible.The discriminator try to judge whether the samples are from the generator or real data.The competing and promoting each other until Nash equilibrium is achieved.The powerful generation ability of GAN can be used to expand a few samples in imbalanced datasets.Firstly,this paper introduces the traditional classification algorithm,the current commonly used imbalanced data classification algorithm and the result evaluation.This paper introduces a data classification method based on random oversampling algorithm for imbalanced datasets.In this paper,an imbalanced dataset classification based on WGAN(Wasserstein GAN)is proposed.The problem of insufficient diversity and stability of synthetic minority samples is solved with the stable generation ability of WGAN.WGAN modifies the loss function and network structure of the original GAN to make it more stable in training.It uses the stable generating ability of WGAN to synthesize a large number of minority samples,so that the two kinds of samples can be equalized.The logistic regression model and discriminator of WGAN are trained by using the balanced data set.The logistic regression and discriminator of WGAN are used to classify the test sets respectively.Finally,we use credit card fraud dataset to carry out experiments.The recall of data augmentation using WGAN is 88%,while the recall of classification using random oversampling is only 85%,and the recall of direct use of raw data is only 52%.Because there are too few samples of a few classes in some imbalanced datasets,this paper proposes a classification method of imbalanced datasets based on CycleGAN,which solves the problem of too few samples of a few classes and the limitation of generation effect by using the ability of CycleGAN inter-domain conversion.CycleGAN uses two generators and two discriminators to transform the two types of samples between domains.CycleGAN's non-paired inter-domain conversion ability is used to transform the majority of samples into a few samples,so that the two types of samples can be balanced.Then VGG network is trained for classification and compared with traditional data augmentation methods.Finally,experiments are carried out on face data sets,and the effectiveness of data augmentation using CycleGAN is proved.This paper mainly uses GAN's generating ability to expand the data of imbalanced datasets,and then classify with traiditional classifiers,so the following work can use the strong discriminant ability of discriminator to classify the balanced datasets.
Keywords/Search Tags:Imbalanced, Data Augmentation, GAN, WGAN, CycleGAN, Generator
PDF Full Text Request
Related items