Font Size: a A A

Research And Implementation Of Monocular Depth Estimation For 3D Reconstruction

Posted on:2021-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:J F CaoFull Text:PDF
GTID:2518306308969779Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scene depth estimation is one of the most important tasks in computer vision.Accurately obtaining the depth information of the scene from the image plays an important role in reconstructing the three-dimensional structure information of the scene,and is of great significance to the computer vision tasks such as object detection and semantic segmentation.Because the estimation of monocular depth is a pathological problem in nature,the traditional research is mainly based on structured light,binocular and so on.With the development of deep learning,monocular depth estimation attracts researchers' attention again.In this paper,a deep neural network(DCNN)model based on occlusion cue induced and scene-aggregated is proposed,which can predict the corresponding depth map from a single RGB image.The model mainly designs three modules and uses multi-task loss functions,which greatly improves the accuracy of monocular depth estimation,and retain the structure information of scene in depth map well.In this paper,based on the characteristics of the monocular depth estimation task as a pixel-wise regression task,a network architecture based on encoder and decoder is employed.The network model proposed in this paper consists of five parts:Encoder,Decoder,Global Information Extractor,Occlusion Learner,and Strip Refinement Module.The encoder uses operations such as convolution pooling to extract multiple features at different levels in the image;the decoder uses deconvolution to restore feature resolution and predict scene depth layout.In order to preserve richer global information and thus characterize the global depth layout of the scene,this paper uses hole convolution and average pooling to form a global information extraction module that fuses scene context information of multiple large receptive fields and subregions.The occlusion cues learner fuses features at different levels,uses high semantic features to guide low semantic features,and gradually screens out detailed information that is not related to depth changes.The strip refinement module in this paper consists of two orthogonal strip convolution and residual modules,which better integrate the global depth layout and occlusion cues information,and infer the depth value of each pixel in the scene.In order to verify the validity of the algorithm model,the quantitative and qualitative experiments on NYU Depth v2 dataset and the depth inference process on SUN-RGBD dataset to explore the generation of model are carried out in this paper.At the same time,a 3D reconstruction system based on ROS is implemented in this paper,and the depth map collected by Kinect is replaced by the depth map predicted by the proposed model as the system input,which has achieved considerable reconstruction effect.
Keywords/Search Tags:monocular depth estimation, DCNN, regression, occlusion cue, 3D reconstruction
PDF Full Text Request
Related items