Font Size: a A A

Research Of Object Detection At Large Scale Based On Deep Learning

Posted on:2023-12-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J R TanFull Text:PDF
GTID:1528307316951099Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Object detection aims to localize and classify every object in a single image.As a fundamental task in computer vision,it plays an important role in many downstream tasks and fields,such as pedestrian detection,pose estimation,autonomous driving,intelligent security.Conventional object detection mainly relies on handcrafted function to extract features and train models.However,those models perform poorly with poor robustness in some complex scenarios,which cannot meet people’s needs.In recent years,with the help of massive amounts of data and the emergence of deep learning,object detection technology has achieved a great progress.Object detection based on deep learning has learned relatively robust feature representations of objects and achieved good generalization from a large amount of labeled data.It is capable to deal with object deformation,poor lighting conditions,crowding,occlusion,etc.in complex scenes.But on the other hand,the current data scale is still far from enough.For example,the current data scale used to train an object detection system is usually only in the tens of thousands,and the number of categories included is usually less than one hundred.In order to obtain a better object detector,and to provide a better pre-trained model for downstream tasks,people prefer to train object detection systems on larger-scale data,such as more than one hundred million data,including over one thousands of categories.However,it is very difficult to train an object detector successfully at such a large scale.For example,data imbalance,scarcity of data annotation,and low training efficiency have greatly restricted the development of large-scale object detection systems.This paper focuses on how to successfully train deep learning based object detection and recognition systems at large scale.We also carry out extensive studies.The main innovations and contributions of this paper are as follows:(1)This paper analyzes the background and significance of large scale object detection,and it further reveals the importance of this research direction.Secondly,it gives an overview and in-depth analysis and discussion of the main problems in this field.It also summarizes some existing solutions to these problems at home and abroad and their shortcomings.At the same time,some discussions were carried out,pointing out the main research path for large scale object detection in the future.(2)In order to understand and solve the long-tail problem in the large scale object detection,i.e.,the problem of extreme data imbalance.We conduct a detailed analysis of the reasons why object detection performs poorly in the large-scale longtail scenario.We found that there is a problem of excessive suppression of negative gradients for tail categories in the conventional loss function.By improving the widely used cross-entropy loss function,blocking some of the discouraging negative gradients for tail categories,an equalization loss function(Equalization Loss)is proposed.This idea has been verified in the follow-up experiments to be universal in other datasets,and it has also achieved good results on the long-tail classification task.(3)In order to further understand the essence of the long-tail problem in large scale object detection,we further make analysis based on the equalization loss.Through statistical observation of the training process,we found that the ratio of the positive and negative gradients received by the category classifiers can be used as an indicator to determine whether a category is training balanced.The imbalance of the positive and negative gradients ratio makes the entire training process dominated by head categories,and ignored tail categories.Based on it,a gradient-guided reweighting method,equalization loss V2 is proposed.This method uses accumulative gradient ratios to adjust the training process of each category independently and dynamically,so that each category can be trained balanced in large scale long-tail scenarios.As a higher-level indicator,positive to negative gradients ratio has good generalization.The hyperparameters determined in one dataset can be used directly in another dataset without further tunning.(4)A framework for improving the object detection system in large-scale long-tail scenes by using weakly labeled images is proposed.We found that in the case of large-scale data,the absolute number of some minority categories is very small.Since deep learning based method is hungry for data,the lack of data will inevitably cause the model to easily ignore tail categories,and lead over-fitting and poor generalization on these categories.Considering that there is a large amount of image classification data with weak labels that can be collected in the Internet and the community,this research improves the performance and robustness of the model through joint training of detection and classification data.This method can use classification data ten times more than the detection data while introducing a negligible computational cost.(5)In order to solve the problem of low efficiency of large scale object detection training,we propose a large batch training method.This method can significantly reduce the training time by increasing the number of training devices without loss of accuracy.When the number of training image is about 10 k,the training time has been reduced from twelve hours to one hour.This method analyzes the training instability and accuracy loss problems that exist in the training of large batch object detectors,and finds that the difficulty of training mainly occurs in the early stage of model training.We use a novel batch warmup method to stable the training and maintain the accuracy;Meanwhile,we also solve the problem of inaccurate statistics in the Batch Normalization layer in large batch training by introducing a inter-group EMA strategy.
Keywords/Search Tags:object detection, convolutional neural network, large scale object detection, long-tail object detection, large batch object detection, longtail recognition, weakly semi-supervised object detection
PDF Full Text Request
Related items