Font Size: a A A

Research On Single Image Depth Estimation Method Based On Multi-scale Attention Mechanism

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:X C JiangFull Text:PDF
GTID:2518306773481274Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Image depth estimation is to extract depth information from a given image to reconstruct a three-dimensional scene.It is one of the most basic research fields of computer vision and has extremely important applications in frontier fields such as semantic segmentation and automatic driving.Among them,depth estimation from a single image is the most difficult,because the same image can map multiple 3D scenes,and computers are not like humans can accurately judge the depth of a single image based on rich prior knowledge,so this task is very difficult for a computer.In recent years,with the development of neural networks,deep learning methods have been widely used in depth estimation research due to their good feature extraction and generalization capabilities,which solves the problem of poor robustness caused by traditional methods using hand-crafted features.The neural network method can obtain more accurate depth estimation,but there is still a big gap between the depth results predicted in complex scenes and the real depth map.First,this paper analyzes the research status of single image depth estimation at home and abroad,and finds that most of the current methods cannot obtain clear target contours in complex scenes.The problem of inaccurate object edge estimation caused by depth of field,this paper combines the relationship between depth estimation and semantic information,and proposes a single image depth estimation based on a multiscale Transformer residual pyramid network that can perceive the semantic information of the scene.This method uses three forms of Transformer to build a feature pyramid with rich semantic information and applies itself to the encoder-decoder structure to obtain a new neural network model to achieve dual feature fusion of image space and scale,which can obtain clearer object edge contours,thereby improving the accuracy of single-image depth estimation.Then,in view of the problem of loss of details in the area with small depth gradient in the depth map,the depth map is optimized from the current single-image depth estimation method based on the encoder-decoder structure.Most methods today use interpolation or deconvolution for upsampling in the decoder structure.The interpolation method has the nature of low-pass filtering,which can damage highfrequency components,resulting in blurred object details in the predicted depth map.The deconvolution method uses the same convolution for the entire image,which limits the response to local depth changes,and also brings huge parameters,making the calculation very inefficient.Therefore,this paper proposes a single image depth estimation upsampling method that combines global attention mechanism and content awareness.It has a large receptive field and can perceive semantic information in the scene without introducing too many parameters and calculations.At the same time,the high-level features are used to guide the underlying features to restore the accurate depth map.This method is applied to the decoder of the multi-scale residual pyramid network,so that the network shows high sensitivity to the depth gradient and realizes the depth map.optimization.Finally,the network model proposed in this paper is trained on the NYU-Depth V2 indoor depth dataset,and correspondingly,it achieves better accuracy and error values than most current single-image depth estimation methods on the test set,and obtains Clearer object outlines and object details,thus verifying the effectiveness of the singleimage depth estimation method proposed in this paper.
Keywords/Search Tags:monocular depth estimation, Transformer, Encoder, Decoder, Attention
PDF Full Text Request
Related items