At present,deep learning models have high requirements for the dataset‘s scale and labeling accuracy,especially for object detection tasks.However,in some application scenarios,it is difficult and expensive to obtain a dataset of sufficient size for same dimension annotation(labeled data contains five-dimensional information of classification,location,and size)for strong supervised learning.Therefore,it is of research significance and practical value to explore how to use incomplete datasets to train deeplearning object detection models,which is called weakly supervised object detection.Based on the above background,this paper deeply studies the problems and shortcomings of the current mainstream weakly supervised object detection models,and proposes two improvements based on theoretical and experimental analysis.They are multi-feature based proposal screening algorithm that can balance the ratio of positive and negative samples of the proposals,and an improved multi-instance learning network based on noisy-label learning technology.These two improved structures improve the overall detection effect of weakly supervised object detection algorithm from many aspects,such as feature extraction ability,classification and detection ability,and data noise adaptability.The main research contents and results of this paper are as follows:(1)Through the clustering and quantitative analysis of training data,detection results,and model intermediate results,we speculate and verify the root causes of missed detection,false detection,especially part domination problems in the current mainstream weakly supervised object detection algorithms.(2)Using the objectness measure and the class activation map as the extra information,the proposal screening algorithm is constructed to balance the proportion of positive and negative samples in the proposals.And it is used for further optimization of model training and test.(3)Noisy-label learning technology is used to improve the multiinstance learning network of the baseline model,and adaptive pseudo-label correction training is performed to reduce overfitting caused by missing information in order to improve the noise adaptability and accuracy of the model.(4)The general dataset Pascal VOC 2007/2012 is used to compare the detection accuracy of the improved model and the baseline model to verify the improvement brought by the new structure.And the adaptability of the new model to the subdivision detection task is verified on the remote sensing dataset RSOD and NWPU VHR-10.v2. |