Font Size: a A A

Data Based Oversampling In Imbalanced Data Classification

Posted on:2020-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LuoFull Text:PDF
GTID:2370330575464699Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Imbalanced data means that the target variable of the data is a categorical variable,and there are serious imbalances in two or more categories.For example,the proportion of default samples and non-default samples in the credit data is often imbalanced.Over the past three decades,the solution to this problem is mainly the undersampling,oversampling and algorithmic improvements.Oversampling methods have been popular in recent years,mainly including simple random resampling and generating new samples represented by SMOTE.Undersampling method is relatively rare in today's research,and the improvement of the algorithm level is difficult to carry out because of the high complexity.Therefore,in this paper,we mainly consider the problem from the sampling point of view.The traditional sampling method,whether based on the extraction or the generation,cannot be optimized according to the spatial structure characteristics of the data set,which makes it impossible to use the existing data in the sample during sampling.Therefore,according to the idea of sampling with the characteristics of data structure,this paper resamples the data set according to the nature of the denoising auto encoder.In this paper,we use two methods to measure the effectiveness of the proposed method.One method is to compare the effectiveness of the proposed sampling method and others by comparing the original distribution against proposed ones.The proposed method is comfort with our intuition.Another method is to use the final classification result indicator as the basis for judging the quality of the classification method.These two methods complement each other and illustrate the effectiveness of the proposed method in two angles.Finally,the sampling method is applied to the classification problem to solve various problems in the context of imbalanced data classification.
Keywords/Search Tags:imbalanced data classification, autoencoder, oversampling
PDF Full Text Request
Related items