Research On Sample Imbalance Problem Based On Deep Learning

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zheng

Full Text:PDF

GTID:2518306755497504

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Data is one of the indispensable factors for deep learning and relies on the corresponding datasets in various vision task scenarios.The basic assumption for training classifiers is that the number of samples in different categories is roughly balanced based on the dataset under study.Modern deep learning methods perform well on uniform distributions,however,the number of samples is unbalanced in the long-tailed natural world,and such imbalance poses a great challenge for deep learning-based model training and practical applications.For the sample imbalance problem,which has received much attention in recent years,this paper defines several common forms of imbalance and uses the relevant open-source datasets to reconstruct them as research objects.On the sample imbalance problem in computer vision,the idea of reweighting is used to investigate deep learning on image classification and target detection.The main research work of this paper includes:(1)We first investigate existing industry-commonly used methods for unbalanced datasets in image classification.A detailed comparative analysis of the existing resampling,reweighting methods is performed.Experiments are conducted on three benchmark datasets,CIFAR-10,CIFAR-100,and Image Net-Tiny,using the Res Net18 model.It is found that the reweighting method is better than more stable in dealing with the sample imbalance problem.(2)Due to the excellent performance of the reweighting method on the sample imbalance problem,this paper conducts a study on the reweighting method.The effects of three factors,namely the number of categories,the number of samples,and the degree of category imbalance,on the reweighting results are considered.In previous studies,the setting of weights is only related to the number of category samples,and it is very rough to rely on the information of the number of category samples alone to determine the size of weights in a sensitive weighting method.The existing effective sample calculation method is improved by the properties of the three datasets themselves,and thus the optimization of the existing effective sample loss.An adaptive effective sample weighting method is implemented.The effectiveness of our method is demonstrated in the constructed long-tail CIFAR dataset.(3)Also for the problem of imbalance between hard and easy samples in target detection,this paper proposes a strategy of cascade optimization based on Cascade R-CNN.The quality of the proposed region in the training process is gradually optimized while balancing the positive and negative samples,and the gradients generated by the difficult and easy samples in the border regression are balanced by the regression loss,so as to balance the difficult and easy samples in the detector training process.The method of this paper is validated on SKU-110 K and MS-COCO2017 datasets,and the method is experimentally proven to be effective in improving the detection accuracy of the detector.

Keywords/Search Tags:

Deep Learning, Data Imbalance, Long-tail Distribution, Image Classifacation, Object Detection

PDF Full Text Request

Related items

1	Research On Target Detection Algorithm Based On Long Tail Distribution Data Set
2	Research On Classification Algorithm Of Ancient Chinese Characters Based On "Long Tail Distribution"
3	Label Structure Based Deep Learning For Long-tail Distributed Classification
4	Towards Entity Relation Extraction On Long-tailed Data Distribution
5	Research On Object Detection Algorithm Based On Deep Learning
6	The Imbalance Problem In Object Detection Based On Deep Learning
7	The Foreground-Foreground Class Imbalance Problem In Object Detection Based On Deep Learning
8	Deep Feature Learning For Data Irregular Distribution
9	Distantly Supervised Relation Extraction Method And Its Application Based On Deep Learning
10	Research On Web Service Classification To Address Semantic-sparsity And Sample-sparsity Issues