Font Size: a A A

Deep Learning Based Monocular Depth Estimation

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X N ZhangFull Text:PDF
GTID:2428330620461346Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image depth estimation is an important research topic in the field of computer vision.Depth information is an important part to understand the three-dimensional structural relationship of a scene.Accurate depth information can help us understand the scene better.It's widely used in the true 3D volumetric display,semantic segmentation,automatic driving,and 3D reconstruction.Traditional methods mostly use binocular or multiocular image depth estimation,the most common method is stereo matching technology,which uses triangulation to estimation the scene depth information from the image.But it is easily affected by the scene diversity,and the calculation is considerable.The acquisition of a monocular image requires less equipment and environmental conditions.The depth estimation based on the monocular image is closer to the actual situation,and the application scene is more flexible.With the rapid development of deep learning,the method based on the convolution neural network has obtained specific achievements in the field of image depth estimation,but monocular depth estimates still has many challenges: the complex texture and geometric in complex scene will lead to the loss of local details,distorted of object boundaries,and blurry reconstruction,directly affect the precision of image restoration.To solve the above issues,the depth estimation method of a monocular image based on deep learning is mainly studying in the dissertation.The main work includes the following two aspects:Firstly,proposed a network model based on the multi-scale residual pyramid attention to solving the problems of object boundary distortion and local detail information loss caused by complex textures and geometric structures in indoor scenes.First,proposed a multi-scale attention context aggregation module,which consists of spatial attention module and global attention module.Considering the position correlation and scale correlation of pixels from spatial and global perspectives,the spatial context information and scale context information of features are captured.By aggregating the spatial and scale context information of the features,the module can adaptively learn the similarity between pixels,to obtain more global context information of the image and solve the complex structure problem in the scene.Then,aiming at the problem that local details of objects are easy to be ignored in scene understanding,an enhanced residual refinement module is proposed.To obtain more semantic information and more detail information while obtaining multi-scale features to refine the scene structure further.Experimental results on NYU Depth V2 dataset show that the method has an excellent performance in object boundaries and local details.Secondly,Aiming at the problems such as inaccurate prediction of detail information and fuzzy reconstruction in existing unsupervised depth estimation methods,consider the principle that non-local can extract the long-term spatial dependence of each pixel and obtain more spatial context.This dissertation proposed a new unsupervised learning depth estimation model by introducing non-local.The model uses unsupervised learning method and uses video image sequence as input and combines camera motion to estimate the depth of outdoor scene.Finally,experimental results on the KITTI dataset show that the depth map estimated by this method is clearer at the object boundary and can recover more detailed information.Also,the proposed model has good robustness and can be applied to different depth estimation network models.
Keywords/Search Tags:Image depth estimation, Attention module, Deep learning, Convolutional neural network, Context information
PDF Full Text Request
Related items