Font Size: a A A

Research On Classification Of Imbalanced Data Set Based On Deep Learning

Posted on:2022-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:X T PengFull Text:PDF
GTID:2518306602455964Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the application of image classification in daily life becomes more and more frequent,the demand for training models with high classification accuracy continues to increase.In recent years,with the success of deep learning in visual tasks,a series of classic models have been proposed.But when the training dataset is imbalance,these classic models are prone to problems such as overfitting,instability,and poor generalization.The existing research methods mainly include resampling and reweighting.These two types of methods can resist the problem of data imbalance to a certain extent,but the models designed by these methods are usually very large,and it is difficult to integrate them into deep neural networks to achieve end-to-end training;and fundamentally,the model still lacks a few effective information,so how to enhance the training of the minority class and realize end-to-end training of the network is the difficulty in improving these methods.To address these problems,this paper designs two novel methods to improve the classification performance of unbalanced data sets.1.The paper proposes Cross Entropy-Focal Loss and Cos-Sin Loss based the degree of imbalance,called Imbalance Cross Entropy-Focal Loss(I-CEFL Loss)and Imbalance Cos-Sin Loss(I-CosSin Loss),which combine CrossEntropy Loss and Focal Loss with a certain scaling factor.According to the distribution of the dataset,the paper redefines Imbalance Degree(ID),takes it as a decision indicator of the weight function,redistributes the weight of the category,and pays more attention to the contribution of the minority.Using different scaling factors to control the proportion of Cross-Entropy Loss and Focal Loss,making training pay more attention to misclassified samples.During the experiment in Kaggle Mushroom Dataset,the hand-designed Imbalance CIFAR-10/100 and DRIVE,the results show that the proposed CEFL loss and CosSin loss can significantly improve the classification accuracy of imbalanced data sets,especially when the data is extremely unbalanced.And then,the proposed weight function can achieve better performance improvement again.2.In view of the existing methods that can only solve the problem of the data set corresponding to a specific problem,the paper proposes a new regularization technology,called Imbalance Degree Mixup(ID-Mixup),which can relax the conditions of Mixup and make the mixing factors of samples and labels be considered separately.When mixing two samples,although the sample synthesis method is the same as Mixup,ID-Mixup assigns labels to support the minority category by providing a disproportionately higher weight for the minority category.Therefore,the classifier will gradually push the decision boundary to the majority class during the training process,thereby balancing the generalization error between the majority class and the minority class.This paper studies the most advanced regularization technology Mixup,Manifold Mixup and CutMix in the state of class imbalance.The results show that the proposed ID-Mixup is significantly better than these latest technologies and several common re-weighting and re-sampling on the natural imbalanced dataset Mushroom Dataset and the unbalanced dataset constructed based on CIFAR-10/100.And then,ID-Mixup can be integrated with re-sampling and reweighting to achieve superior performance.
Keywords/Search Tags:imbalanced dataset, image classification, deep learning, convolutional neural network, Cross-Entropy Loss, Focal Loss, regularization technology
PDF Full Text Request
Related items