Font Size: a A A

Research On Weakly Supervised Image Visual Semantic Understanding Based On Deep Learning

Posted on:2022-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:1488306758979209Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image visual semantic understanding is a research hotspot in image processing tasks.The existing image visual semantic understanding methods based on deep convolution neural networks often require large amounts of fine-grained annotation data containing detailed object contours.However,acquiring such detailed annotation data takes much time and economic cost,limiting the performance improvement of image vision semantic understanding methods and their generalization ability in complex scenes.To solve this problem,researchers relaxed the annotation accuracy of training data and proposed using image-level annotation data to train the weakly supervised visual semantic understanding model.However,image-level annotation only provides the category of the target in the image,lacking target location and contour information,which implements weakly supervised image vision semantic understanding methods in complex natural scenes a new challenge.This paper focuses on the semantic segmentation and object detection tasks in image vision semantic understanding,optimizing the network structure under the supervision of image-level annotated data to achieve a more complete location contour information mining of the object.The main work and contributions of this paper are as follows:(1)A weakly supervised image semantic segmentation method is proposed based on dilated convolution pixels affinity.The paper aims at the incomplete pixellevel pseudo labels caused by models trained on image-level annotated data focusing only on discriminative regions of the target in the image,which introduce dilated convolution unit with multiple dilated rates and a selfattentive mechanism into the classification model.That adaptively enhances the target regions and suppresses other irrelevant regions while expanding the perceptual field to generate high-quality pixel-level pseudo labelling and improve the accuracy of semantic segmentation model.Experimental results show that the method can effectively improve the pseudo labelling accuracy and achieves 65.3% and 66.2% mean Intersection over Union on the Pascal VOC 2012 validation and test set.(2)A single-stage attention-guided augmented weakly supervised image semantic segmentation method is proposed.Aiming at the problem of increasing model training complexity due to the two-stage weakly supervised semantic segmentation approach.The classification and segmentation models are fused into the same framework,and the segmentation map is directly generated in an end-to-end training manner under the supervision of image-level annotation.The attention guidance module is proposed to guide the model to learn spatial and semantic information in a bottom-up manner.Also,the contextual attention module is proposed to capture the remote contextual dependencies between class-specific feature maps generated between different layers of the network model,adaptively enhancing the object regions and suppressing noise generation.Experiments show that this method has a significantly improved segmentation accuracy than other end-to-end weakly supervised semantic segmentation methods,which achieve mean Intersection over Union scores of66.1% and 66.3% on the Pascal VOC 2012 validation and test set respectively.(3)A weakly supervised object detection method is proposed based on the proposal self-supervised attention learning.Under image-level weak supervision,most object detection methods can only detect salient object parts of the image.The detection results are unstable for different affine transforms of the same image.Thus,a self-supervised attention learning module is proposed to minimize the difference between the original feature attention map and the balanced attention map and the attention map generated by its affine transformation through the consistency regularization loss.In the proposal region selection phase,high confidence object proposal regions are adaptively selected as positive examples,while only class-specific object proposal regions are selected as hard negative examples,thus promoting weakly supervised object detection model training.Experimental results show that the performance of each module proposed by this method is significantly improved in detecting small objects and multiple adjacent objects of the same class.On the Pascal VOC 2007 and Pascal VOC 2012 datasets,the proposed method achieved 54.8% and 53.4% mean Average Precision,respectively,and achieved scores of Correct Localization 72.6% and 71.4%.In summary,this paper investigates semantic segmentation and object detection tasks in weakly supervised image vision semantic understanding.It proposes corresponding models and methods with certain theoretical significance and broad application value for the subsequent research of weakly supervised image vision semantic understanding methods.
Keywords/Search Tags:computer vision, semantic understanding, weakly supervised learning, semantic segmentation, object detection
PDF Full Text Request
Related items