Pixel-wise Scene Understanding Based On Fully Convolutional Networks

Posted on:2021-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Jiang

Full Text:PDF

GTID:2518306104987159

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Scene understanding is an important research topic in computer vision,and it is widely used in the fields of autonomous navigation,autonomous driving,unmanned aerial vehicle and assistance system for the blind.The depth information and semantic information are the key to understanding the scene.For a single RGB image,pixel-wise depth information and semantic information can be obtained by monocular depth estimation and semantic segmentation respectively.In recent years,monocular depth estimation algorithm and semantic segmentation algorithm based on deep learning have made great achievements,but still face many challenges due to the complexity and diversity of scenes.In view of the existing problems,this paper has done the following research work:This paper designs an end-to-end pixel-wise scene understanding algorithm framework based on full convolution network,which can be applied to monocular depth estimation or semantic segmentation respectively.The framework adopts the encode-decode structure,takes Res Net as the encoder for feature extraction,and uses the dilated convolution to increase the receptive field.In the decoder part,bilinear interpolation is used to carry out up-sampling step by step,and the feature maps in the same size of the encoder and decoder are concatenated by using skip connection.For monocular depth estimation,this paper proposes a joint loss function with three terms,which combines the errors in depth,gradient and surface normal.Due to their complementarity,the predicted depth map is more accurate,which well preserves the geometric details of the object and the spatial structure of the scene.For semantic segmentation,this paper proposes edge aware loss,which use the edge information to carry out explicit constraint.This can reduce the fuzzy degree of the boundary of different categories and improve the precision of semantic segmentation.For pixel-wise dense prediction tasks,both low-level and high-level features are important,and the feature fusion method based on skip connection is unreasonable.We proposes an Adaptive Multi-scale Feature Fusion(AMFF)module,which can replace the skip connection and can be applied flexibly in the full convolution network based on encode-decode structure.The experimental results show that AMFF makes the fusion of low-level high-resolution features and high-level low-resolution features more effective and can improve the performance of the algorithm.The depth information and semantic information in the scene are correlated,and the multi-task model can be used to obtain the two kinds of information simultaneously,so as to reduce the computation and improve the efficiency.For this reason,this paper proposes a multi-task scene understanding network that can simultaneously perform monocular depth estimation and semantic segmentation.The multi-task network uses an encoder for feature extraction and is shared by the two visual tasks.For the decoding process,this paper designs two ways: independent decoding and interactive decoding,and the latter is added Lateral Interaction Unit(LIU)on the basis of the former.The LIU is responsible for the interaction between depth cues and semantic information.Experimental results show that interactive decoding can achieve better performance than independent decoding,indicating that there is a certain correlation between depth information and semantic information,and they can be learned together to guide and benefit each other.

Keywords/Search Tags:

Scene understanding, Monocular depth estimation, Semantic segmentation, Full convolutional network, Multi-Task learning

PDF Full Text Request

Related items

1	Deep Learning Based Monocular Scene Depth Estimation Algorithm
2	Research On Key Technologies Of Monocular SLAM Based On Deep Learning Method
3	Research And Application Of Image Semantic Understanding And Segmentation In Limited Scene
4	Scene Understanding Of Government Service Robot Based On Deep Learning
5	Research On 3D Scene Reconstruction By Monocular Based On Deep Learning
6	Build Semantic Scene Map Based On The Monocular Camera
7	Research On Road Scene Segmentation Method Based On Convolutional Neural Network
8	Research On Monocular Image Depth Estimation
9	Deep Learning Based Depth Estimation From Monocular Image
10	Monocular Image Depth Estimation Based On Deep Learning Model