Font Size: a A A

Research On Imbalanced Data Augmentation And Imbalanced Classification Based On Auto-Encoder

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z X CaiFull Text:PDF
GTID:2518306563475694Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Imbalanced classification is a frequent problem in actual production and life,such as bioinformatics,telecommunications or financial risk assessment and text classification.Traditional classification methods expect to maximize the overall accuracy,so they often ignore the classification results of minority classes.However,in the actual production and life,the classification accuracy of minority classes is often more important.In order to solve this problem,an effective method is to rebalance the imbalanced data by data augmentation.The goal of this method is to generate new samples for minority classes with strong class discrimination and diversity,which are really helpful to the construction of classifier.However,since minority classes contain limited instances,it is hard to capture the fundamental characteristics of the imbalanced data distribution.What's more,it is difficult to generate samples with high quality if just depending on minority classes.The existing methods provide only a partial understanding of these issues and result in the low quality generated samples and inaccurate classifiers.Therefore,in order to better solve these problems and improve the classification accuracy for imbalanced data,two new imbalanced data augmentation and imbalanced classification method based on auto-encoder are proposed.In order to solve the difficult problem of capturing the fundamental characteristics of the imbalanced data distribution,a novel Supervised Class Distribution Learning for GANs-based Classification is proposed(SCDL-GAN).SCDL-GAN is a novel imbalanced classification framework with two stages.The first stage aims to accurately determine the class distributions by a supervised class distribution learning method under the Wasserstein auto-encoder framework.The second stage makes use of the generative adversarial networks to simultaneously generate instances according to the learnt class distributions and mine the discriminative structure among classes to train the final classifier.The experimental results demonstrate that SCDL-GAN consistently benefits the imbalanced classification task in terms of three widely-used evaluation metrics on four benchmark datasets.In order to solve the difficult problem of transferring knowledge from majority classes to minority classes,a novel Inter-class Information Transferring for AE-based Imbalanced Classification(Trans-AE)is proposed in this paper.The first stage of TransAE learn S discriminative information within class and transferable information between classes in hidden space based on class prototype learning and entropy maximization.The second stage combines the discriminative information of minority classes to be augmented with the transferable information from majority classes to generate new samples.Finally,the classifier combined with class prototype information is used to predict the clear boundaries between classes.The experimental results show that Trans-AE can not only maintain a high overall classification accuracy,but also improve the classification accuracy of a minority classes.
Keywords/Search Tags:Computer Science, Machine Learning, Imbalanced Classifition, Data Augmentation
PDF Full Text Request
Related items