Font Size: a A A

Multi-granularity Visual Understanding Based On Context Modeling

Posted on:2021-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:1488306560985829Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As one of the key techniques of the artificial intelligence system,visual understanding has a high potential for several applications,such as self-driving and satellite monitoring.Facing the complexity of the scene and the particularity of the task in practical applications,how to achieve more precise and multi-granularity visual understanding to meet the high-precision requirements becomes a crucial problem.In this paper,we mainly study object detection and semantic segmentation based on deep neural networks to address the challenges of small objects and semantic boundary regions.By mining the potential contexts of different tasks,a series of novel deep neural network models based on context modeling are proposed to understand the visual information from the regional level to the object level and to the part level.The main contributions of this paper are summarized as follows:· For object detection,we propose a novel context embedding object detection network to improve the performance on small objects.Concretely,a context embedding module is introduced to capture multi-scale neighborhood information,and the resolution of the feature map is enlarged to capture the details of small objects.Besides,we have built a new large-scale dataset for small object detection to benchmark the proposed network,which includes more than 58 k millimeter wave images.The improvements on two datasets well demonstrate the effectiveness of the neighborhood context modeling for small object detection.· For semantic segmentation,we first propose an edge-aware fully convolutional neural network(Edge-aware FCN)to improve the segmentation accuracy at boundary regions.By jointly training the edge detection and semantic segmentation tasks,the learned edge-aware features are integrated to guide the segmentation on boundary regions.To further polish the details,a post-processing network(Enhanced-Net)based on semantic context modeling is designed.The semantic confidence map generated from Edge-aware FCN is exploited as the semantic context,and multiresolution features with semantic context are fused into the Enhanced-Net to encourage the network to focus on the confusing regions.Experimental results show that the edge and semantic context modeling can effectively improve the accuracy of semantic segmentation in details.· For fine-grained semantic segmentation,we propose a novel fine-grained semantic segmentation network based on local and global context modeling.Particularly,we design three modules for capturing the local details,semantic edges,and the global context information,which results in a simple yet effective Context Embedding with Edge Perceiving(CE2P)framework.Experimental results show that the local and global context modeling can effectively improve the fine-grained semantic segmentation performance.Besides,focusing on human parsing,we propose a novel human parsing framework based on mutual context modeling.Considering the strong correlation between human pose estimation and human parsing,the pose keypoints and human body parts are utilized as the complementary context for each other.The experimental results prove that mutual context modeling can effectively improve both performances.· For fine-grained image content transfer,we propose a spatial context based texture generation network.Fine-grained image content transfer aims at transferring the texture content inside a body part from the source person image to the corresponding region in the target image.To transfer the texture content to the target person in an arbitrary view,we exploit a texture map to infer unobserved views for the source image.Specifically,we first design a novel spatial-prior map for modeling the spatial correspondence inside each body part,and a texture map generation network is proposed to complement the texture for unobserved views.Experimental results show that spatial context modeling is beneficial to preserve texture details for fine-grained image content transfer.
Keywords/Search Tags:context modeling, semantic segmentation, object detection, deep learning, visual understanding
PDF Full Text Request
Related items