Font Size: a A A

Imbalanced Data Enhancement Algorithm Based On GAN And Its Application Research

Posted on:2020-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2428330578954630Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of machine learning and data mining,more and more researches have been achieved for more accurate prediction and inference by mining the rules and characteristics in massive data.In order to improve the efficiency of information acquisition,automatic classification of massive data is one of the most commonly used methods.Traditional classification methods are based on the assumption that the class distribution is roughly balanced.Imbalanced data is found in many aspects of people's lives,such as network attack identification,cancer detection,etc.In recent years,the research of imbalanced data enhancement algorithm has received more and more attention.The existing mainstream methods for solving the problem is the combination of sampling algorithm and ensemble learning algorithm,such as SMOTEBoost,RUSBoost,EUSBoost and so on.At initialization,these algorithms give each sample the same weight.Then each classifier is trained separately.The weight of the sample is continually adjusted based on feedback from the error rate.Finally,a better performing classifier can be obtained.In some specific cases,these algorithms depend too much on the original datasets.Based on the above problems,this paper proposes to use the Generation Adversarial Network to solve the problem of imbalanced data classification.The main contributions can be summarized as follows.(1)Because the training dataset sample is insufficient.An algorithm of imbalanced data enhancement based on GAN is proposed.The network is used to generate images to form new datasets,and then extract image features and classify them.Experiments show that the images generated by this method are diverse,and the classification results of the data have been improved.(2)Aiming at the situation that some of the datasets have poor quality images and affect the final classification results,an imbalanced data enhancement algorithl based on GAN and ensemble learning is proposed.The ENN and Tomek Link are used for data cleaning.Then a new ensemble learning classifier is proposed by voting method.This classifier combined multiple single learning models to obtain a unified ensemble learning classifier model.By this method,more accurate,stable and robust classification results can be obtained.The above two studies have achieved the expected goal through experimental verification.Experiments on four datasets show that the classification accuracy of the imbalanced data enhancement algorithm based on GAN has been significantly improved,and it can also effectively synthesize realistic images.
Keywords/Search Tags:Imbalanced data enhancement, GAN, DCGAN, ENN, Ensemble learning
PDF Full Text Request
Related items