Font Size: a A A

Research On Self-supervised Monocular Depth Estimation Method Based On Attention Mechanism

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:X D KongFull Text:PDF
GTID:2518306353979069Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
One of the main tasks of computer/robot vision is the perception of the 3D world.Depth and ego-motion estimation play essential roles in perceiving 3D scenes from videos and images and have broad applications such as robotics and autonomous driving.For accuracy and robustness of depth estimates in these areas,range finding sensors such as Li DAR or stereo/multi-camera rigs are often deployed.In practice,however,this way of obtaining depth from sensors is expensive and complex.This has led to the gradual development of learningbased methods,among which supervised convolutional neural networks that rely on trueground depth are the most successful.The excellent monocular depth estimation results of supervised learning methods rely on ground truth RGB-D data.Collecting accurate and large ground truth data sets is difficult due to sensor noise and limited operating capabilities(due to lighting,weather conditions,etc.).On the other hand,RGB images and video data available in the field is relatively easy to obtain.In recent years,the use of synchronised stereo pairs or monocular video to obtain depth maps in a self-supervised manner has been gaining interest.Due to the wide availability of training video sequences,monocular video has become a powerful alternative to stereo images for learning.This paper improves the current state-of-the-art self-supervised joint learning framework,using monocular continuous frames for depth estimation.The framework adds triplet attention to the deep and pose networks.The module builds inter-dimensional dependencies by the rotation operation followed by residual transformations,and calculates the attention weight by obtaining the interaction of different dimensions in the feature,so it can calculate the importance of each dimension feature in the tensor.Different from the general channel attention module,this module does not have any information bottleneck,which can make the optimization of the learning framework more reliable.Funnel activation(FRe LU)is used to replace ELU in the nonlinear activation layer,which can adaptively capture the local context information in the image.Through the pixel-level modeling capabilities provided by spatial conditions,Funnel attention can enrich the details of complex images in the depth estimation map.In order to verify the effectiveness of this article to improve the network,this paper has been tested on the KITTI data set.The experimental results show that the improved network in this paper has obvious advantages compared with the state-of-the-art self-supervised monocular depth estimation method.
Keywords/Search Tags:convolutional neural network, self-supervised learning, monocular depth estimation, attention mechanism
PDF Full Text Request
Related items