Font Size: a A A

Research On Depth Map Estimation Based On CNN And Visual SLAM

Posted on:2021-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:R Y DaiFull Text:PDF
GTID:2428330647467269Subject:Intelligent perception and control
Abstract/Summary:PDF Full Text Request
Depth map acquisition is an important foundation of 3D scene perception and 3D reconstruction.Sensors frequently used for depth information,such as lidar or Kinect,suffer from high cost and short sensing distance,which limits its application.In recent years,with the extensive application of deep learning,the estimation of monocular depth map based on convolutional neural network(CNN)has attracted the attention of researchers.Supervised-learning-based depth map estimation calls for massive amount of labeled data,and the generalization ability of the model is poor.Unsupervised learning becomes a successful solution for depth estimation without extra depth ground truth.The typical pipeline for unsupervised depth estimation is to transform depth estimation into view reconstruction,the target view is reconstructed with currently estimated depth map and the camera pose between adjacent views.The optical error between the reconstructed view and the original target view is used to constrain the network training to iteratively optimize the parameters.The above method only uses the current frame or the adjacent frames to estimate the depth without global and geometric optimization.Accordingly,this paper proposes a depth estimation method which combines CNN and traditional simultaneous localization and mapping(SLAM)algorithm to estimate the depth from unstructured video sequences.The unsupervised learning framework of this paper involves three modules: depth map estimation network,camera pose optimization and target view reconstruction.Existing depth estimation network is blamed for ignoring the important details of the image,such as shape,edge.This paper introduces attention model into depth estimation network to preserve the details of depth map.The attention model changes the weight of different points on the global feature map.Both applied with the attention model,the encoder extracts and preserves the multi-scale features while the decoder recover more detailed depth map features,promoting the network to maintain the shape of the object and enhance the edge of the depth map.With higher context information taken into account,this paper additionally uses dilated convolution to expand the receptive field without changing the spatial size of the feature map.This further improves the accuracy of the depth map.The optimal dilated rate to maximize the field of convolution kernel is found in this paper.Experiments show that this method not only improves the accuracy of estimated depth map on the basis of the above methods,but also surpass most CNN based methods.Under the evaluation of standard measurement tools,absolute relative difference,square relative difference,root mean squared error and logarithmic root mean squared error are effectively reduced,and three accuracy rates based on different thresholds are effectively improved,which verifies the effect and advantages of local and global dual optimization mechanism.In view of the problems of low accuracy and poor generalization ability of camera pose method based on deep learning,this paper introduces ORB-SLAM algorithm into unsupervised learning framework.This module is embedded into view reconstruction framework,where reprojection error is minimized and all frames are utilized to optimize camera pose globally.In this way the quality of reconstructed view is optimized,so as the depth map.Experimental results show that the camera pose optimized by ORB-SLAM algorithm is pretty close to the ground truth,which effectively promotes the frame to generate more accurate reconstruction view and optimize the depth map.
Keywords/Search Tags:CNN, visual SLAM, unsupervised learning, monocular, depth map estimation
PDF Full Text Request
Related items