Font Size: a A A

Research On Monocular Depth Estimation Based On Deep Learning

Posted on:2024-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhaoFull Text:PDF
GTID:2568307064494794Subject:Engineering
Abstract/Summary:PDF Full Text Request
Depth information has important application value and research significance for intelligent system to perceive external environment and estimate its own state.Multiple3 D scenes can be projected onto the same 2D scene,so inferring depth information from a single RGB image is an ill-posed problem.Traditional depth estimation methods,such as stereo matching and structure from motion are all based on multi-view geometry theory.However,the depth map is sparse.Traditional methods can not achieve dense depth prediction,which is difficult to be applied to unmanned navigation system,robotics and other fields.With the development of deep neural network,deep learningbased methods have achieved a lot of research results.The deep neural network can accurately and real-time predict the dense depth at pixel level,which can help improve the accuracy of automatic navigation.The existing monocular depth estimation methods have some problems,such as low accuracy of depth estimation,unclear contour of object boundary and poor generalization ability of depth estimation model,which limit the development of entity popularization in the field of monocular depth estimation.This thesis studies the basic theory of deep learning-based monocular depth estimation method and proposes monocular depth estimation algorithms based on supervised learning and selfsupervised learning respectively.The main research contents of this thesis are as follows:(1)The theoretical basis of monocular depth estimation algorithm is studied.Firstly,the camera imaging model,imaging principle and coordinate transformation in the imaging process are introduced.And then the basic method of monocular depth estimation and the solution method of pose during camera motion are introduced.Finally,from the perspective of the deep learning algorithm,the process of convolution,pooling,nonlinearity and full connection in convolutional neural network are illustrated.(2)A monocular depth estimation algorithm model based on supervised theory is studied,and an encoder-decoder framework based on attention refinement pyramid is designed.Aiming at the limitation that the convolution operation only acts on the local scope,a block cross-scale attention mechanism is proposed.By setting the appropriate image block size,the feature images at each scale are split into the same size image blocks.And the shallow feature and deep feature are fused in the global scope.The reverse split operation then restores the fused features.Pyramid refinement module is proposed to solve the problem of lack of depth details in depth prediction results.The restored features and the refined feature maps from higher scales are processed by online refinement module,and the features are refined from coarse to fine in the form of pyramid by integrating the features of corresponding spatial positions.(3)A monocular depth estimation algorithm model based on self-supervised theory is studied,and a U-Net framework based on scale aggregation and depth reconstruction is designed.Considering the poor feature extraction capability of the framework,a multi-scale feature fusion module is embedded in the skip connection.By adding intermediate nodes,the semantic information and spatial information of adjacent scales are aggregated.In view of the inaccuracy of pose prediction in existing pose estimation methods,residual pose estimation network and depth reconstruction loss are proposed for training.The motion information between image sequences is extracted iteratively,and the image is reconstructed according to the iterated pose.At the same time,a threshold segmentation mask is proposed to solve the problem of abnormal pixel extraction which affects the model convergence in the training process.Remove moving pixels and low-texture pixels by setting the threshold parameter.The monocular depth estimation algorithm studied in this thesis can obtain the depth and pose information of monocular image sequence in real time,and then reconstruct the monocular scene in 3D.Both of the two algorithms enrich the semantic information and spatial information in the depth map,refine the object boundary in the depth map,and improve the quality of the depth map.Experimental and visual analysis results on KITTI and NYU Depth V2 datasets show that the proposed algorithm performs well in threshold accuracy and absolute relative error,and ensures the running speed of the algorithm.High precision depth prediction and pose prediction can be achieved in both outdoor vehicle driving environment and indoor environment.
Keywords/Search Tags:Monocular Depth Estimation, Deep Learning, Attention Mechanism, Pose Estimation, Image Reconstruction
PDF Full Text Request
Related items