| Object detection is the basic problem of computer vision and attracts attention.Deeplearning-based models heavily rely on vast high-quality manually-labelled data.However,annotating big visual data for detection is expensive and time-consuming.This thesis focuses on theoretical research and technical studies of weakly supervised object detection(WSOD).The existing models have many problems due to weak supervision.This thesis makes lots of investigations and analyses,and abstracts three scientific problems,including non-convex optimization,framework limitation and practicality problem,and proposes corresponding solutions.The main innovations are summarized as follows:(1)We propose object-specific pixel gradient for WSOD,which aims to locate objects directly from gradient information and avoid the non-convex optimization problem.(2)We propose a cyclic guidance,which mines the global gradient information and combines the local information to avoid the local minimum of non-convex optimization.(3)We propose a fast inference framework via generative adversarial learning,which decouples model training and testing processes to achieve real-time inference framework.(4)We propose a unified framework,which constructs a self-contained and end-toend training and reasoning framework to achieve a high-capacity general detection model.(5)We propose a deep residual network framework with sophisticated analysis,and then introduce a sequence of design principles to enable deep residual learning for WSOD.(6)We propose a noise-aware fully webly supervised object detection,which improves robustness and reduces manual annotations to enhance the practicability of WSOD.(7)We propose a parallel detection-segmentation algorithm,which applies WSOD to instance segmentation and shows that the WSOD-based method achieves excellent results.(8)We propose a joint thing-and-stuff mining for weakly supervised panoptic segmentation to further explore and cooperate with WSOD and various segmentation tasks.Extensive experimental results and theoretical analysis show that the proposed methods improve performance of corresponding object detection and image segmentation tasks. |