Research On Classification Of Imbalanced Data Set Based On Deep Learning

Posted on:2022-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:X T Peng

Full Text:PDF

GTID:2518306602455964

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the application of image classification in daily life becomes more and more frequent,the demand for training models with high classification accuracy continues to increase.In recent years,with the success of deep learning in visual tasks,a series of classic models have been proposed.But when the training dataset is imbalance,these classic models are prone to problems such as overfitting,instability,and poor generalization.The existing research methods mainly include resampling and reweighting.These two types of methods can resist the problem of data imbalance to a certain extent,but the models designed by these methods are usually very large,and it is difficult to integrate them into deep neural networks to achieve end-to-end training;and fundamentally,the model still lacks a few effective information,so how to enhance the training of the minority class and realize end-to-end training of the network is the difficulty in improving these methods.To address these problems,this paper designs two novel methods to improve the classification performance of unbalanced data sets.1.The paper proposes Cross Entropy-Focal Loss and Cos-Sin Loss based the degree of imbalance,called Imbalance Cross Entropy-Focal Loss(I-CEFL Loss)and Imbalance Cos-Sin Loss(I-CosSin Loss),which combine CrossEntropy Loss and Focal Loss with a certain scaling factor.According to the distribution of the dataset,the paper redefines Imbalance Degree(ID),takes it as a decision indicator of the weight function,redistributes the weight of the category,and pays more attention to the contribution of the minority.Using different scaling factors to control the proportion of Cross-Entropy Loss and Focal Loss,making training pay more attention to misclassified samples.During the experiment in Kaggle Mushroom Dataset,the hand-designed Imbalance CIFAR-10/100 and DRIVE,the results show that the proposed CEFL loss and CosSin loss can significantly improve the classification accuracy of imbalanced data sets,especially when the data is extremely unbalanced.And then,the proposed weight function can achieve better performance improvement again.2.In view of the existing methods that can only solve the problem of the data set corresponding to a specific problem,the paper proposes a new regularization technology,called Imbalance Degree Mixup(ID-Mixup),which can relax the conditions of Mixup and make the mixing factors of samples and labels be considered separately.When mixing two samples,although the sample synthesis method is the same as Mixup,ID-Mixup assigns labels to support the minority category by providing a disproportionately higher weight for the minority category.Therefore,the classifier will gradually push the decision boundary to the majority class during the training process,thereby balancing the generalization error between the majority class and the minority class.This paper studies the most advanced regularization technology Mixup,Manifold Mixup and CutMix in the state of class imbalance.The results show that the proposed ID-Mixup is significantly better than these latest technologies and several common re-weighting and re-sampling on the natural imbalanced dataset Mushroom Dataset and the unbalanced dataset constructed based on CIFAR-10/100.And then,ID-Mixup can be integrated with re-sampling and reweighting to achieve superior performance.

Keywords/Search Tags:

imbalanced dataset, image classification, deep learning, convolutional neural network, Cross-Entropy Loss, Focal Loss, regularization technology

PDF Full Text Request

Related items

1	Research On Sentiment Classification Of Chinese Short-texts Based On Deep Learning
2	The Study Of Image Classification Based On Regularization And Deep Transfer Learning
3	Research On Chinese Recognition Algorithm Based On Maximum Entropy Regularization
4	Research On Loss Function Of Convolutional Neural Network For Image Classification
5	Research And Application Of Boundary Loss Function For Imbalanced Data Set
6	Research On Deep Learning Methods For Small-sample Image Classification
7	Research On Classification Of Imbalanced Data Based On Convolutional Neural Network
8	Research On Semi-supervised Classification Method Of Hyperspectral Images Based On Convolutional Neural Network
9	Research On Image Fusion And Segmentation Method Based On Deep Learning
10	Research On Imbalanced Dataset Based On Neural Network