Font Size: a A A

Weakly Supervised Semantic Scene Understanding

Posted on:2019-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:B S LaiFull Text:PDF
GTID:1368330545461282Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Semantic scene understanding is an important research field in computer vision,which is an essential tool that helps computer understand the world in a vision-based way.Semantic scene understanding has extensive applications,including autonomous driving,intelligent monitoring,intelligent traffic and so on.However,the high-performance algorithms in semantic scene under-standing require a large amount of pixel-level or instance-level annotations during training,which is extremely expensive.To reduce the burden of annotation,researchers propose semantic scene understanding under weak supervision,where only image-level annotations are required.Weakly supervised semantic scene understanding is promising but it is an extremely challeng-ing problem.The main reason is that image-level annotations can not offer fine-grained localization cues,thus current algorithms can not distinguish targets from different classes.To solve this prob-lem,we propose two principle ways to recover the localization cues and further use them to guide the training of semantic scene understanding algorithms.The first way introduces prior information from outside to recover the location of objects,while the second way tries to discover localization cues inside the neural networks.Since semantic scene understanding is a very broad research filed,in this dissertation,we focus on semantic segmentation and object detection tasks,which belong to low-level semantic scene understanding.Specifically,the main contributions of this dissertation are the following four aspects.Firstly,this dissertation presents a saliency guided weakly supervised semantic segmentation method.This method introduces learnable weights into conventional dictionary learning formula-tion so that it can adapt to the weakly supervised scenario.The weight variable in the new formula-tion can encode the missing location information.In order to restore accurate location information,this method incorporates saliency prior to guide the learning of weights.In addition,it utilizes a dictionary clustering term to get clean dictionary and uses a smoothness term to produce smooth segmentation results.Experimental results in MSRC21,VOC2007 and VOC2012 indicate that every term in the formulation can improve the performance.The comparison with existing meth-ods shows that the proposed method outperforms conventional methods and it is comparable with deep-based methods in early years.Secondly,this dissertation presents a class-specific saliency guided weakly supervised object detection method.This method discovers localization cues under the guidance of class-specific saliency and then uses them to guide object detection.We first propose a context-aware way to select highly confident seed proposals with location information.To address the label imbalance problem of proposals,we introduce an objectness prediction sub-network to weight proposals ac-cording to their objectness values.It shows that the weighting scheme is able to suppress negative proposals.In order to leverage the location information in seeds,we use them to supervise both the proposal classification sub-network and the objectness prediction sub-network.Experiments on VOC2007 and VOC2012 datasets show that each contribution of this method can improve the detection performance and the proposed method outperforms the state-of-the-arts.Thirdly,this dissertation proposes a segmentation-aware weakly supervised object detection method.It incorporates a weakly supervised semantic segmentation branch into the main object detection network to obtain pixel-level location information from network features.We propose a new global pooling layer,that is global dynamic pooling that computes the contribution of each pixel to image-level scores in a dynamic way.This global pooling layer can overcome the disad-vantages of exiting global pooling layers.To make the location information in segmentation usable for detection,we propose a seed selection sub-network that can select highly confident seeds from all proposals,which can be used to supervise the detection network.Experiments on VOC2007 and VOC2012 verify the effectiveness of each sub-network.They also show that the proposed method significantly outperforms the state-of-the-arts.Finally,this dissertation proposes a multi-scale object discovery method for weakly supervised semantic segmentation.The multi-scale object discovery network predict object location under each scale respectively,then a multi-scale voting is utilized to merge localization cues of all scales to discover precise object location.In this network we incorporate three losses that take image-level tags,the smoothness constraint and the initial localization cues into consideration.After objects are discovered,a fully supervised segmentation network is trained to get the final segmentation results.We conduct both qualitative and quantitative experiments on VOC2012 dataset.The results verify the effectiveness of multi-scale object discovery network and they indicate that our method can outperform existing methods.
Keywords/Search Tags:semantic scene understanding, weakly supervised learning, object detection, semantic segmentation, dictionary learning
PDF Full Text Request
Related items