On Monocular Depth Estimation Based On Unsupervised Learning

Posted on:2021-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Sun

Full Text:PDF

GTID:2518306032960269

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

In the field of computer vision,depth estimation is one of the important research directions.Depth estimation is about recover the distance between the object in the scene and the camera from the image.The depth estimation of road scene is very helpful to pedestrian detection and automatic driving.The depth estimation of monocular image based on unsupervised learning,and improves the direct visual odometer to further improve the accuracy of depth estimation.Firstly,the full convolution neural network is used to estimate the depth of monocular image.The full convolution neural network model includes two parts:the encoder part uses the residual network to extract image features,the decoder network uses the extended residual network to integrate image features,and the upper sampling layer is used to expand the network layer size step by step,finally the estimated depth map is output.The multi-scale full convolution neural network is used to extract the global and local features of color image simultaneously and output the detailed estimation depth map.Then,the weighted sum of image re projection error and image smoothing error is used as the loss function of neural network training.In the process of image re projection,the estimation depth similarity error can be transformed into image similarity error,which is the basis of unsupervised learning.The image smoothing error uses prior knowledge,which effectively avoids the problem of over fitting of estimated depth and makes the obtained depth map clear and smooth.the cross validation method is used to select the weight coefficient,and the appropriate super parameters can be obtained quickly by using the cross validation method.The final experiment proves that the best effect can be obtained when the weight of the re projection error is 0.95.Secondly,an improved direct angle method is proposed.In the unsupervised monocular learning,it is necessary to estimate the camera pose transformation between frames,and the interference of the objects such as the bottom texture area and the moving vehicle in the scene to the camera pose calculation is very large.An improved direct angle calculation method is proposed.We first compare the changes of pixels before and after the frame,and then calculate the base texture region and the moving object region mask in the image.By applying the image mask in the Jacobian matrix of the direct method,we achieve a higher precision of pose estimation,and finally get a higher precision of depth estimation.Finally,the output of the network model is analyzed and evaluated by using the KITTI test dataset.The algorithm used in this thesis can effectively recover the depth map of the corresponding scene from the output color image,and it is better than some mainstream monocular depth estimation algorithms in the accuracy of depth estimation and other indicators.Finally,by comparing the two contrast groups,the accuracy of the improved direct vision odometer and depth map is significantly higher than that of the classical direct vision odometer.

Keywords/Search Tags:

Depth estimation, Full convolution neural network, Direct vision odometry, Computer vision

PDF Full Text Request

Related items

1	Monocular Depth Estimation Based On Convolutional Neural Network
2	Deep Learning And Traditional Vision SLAM Based Monocular SLAM
3	Research On Depth Information Estimation For Computer Vision
4	Design And Implementation Of Semi-direct Based Monocular Visual Odometry
5	Research On Unsupervised Monocular Vision Based Depth Estimation Algorithm
6	Robust Depth Estimation Techniques In Computer Stereo Vision
7	The Recovery Of Target Distance From A Single Gray-scale Image
8	Neural Network Based Feature Point Detection Method For Perspective Optimization
9	Pose Estimation Based On Attention-guided Deep Recurrent Neural Network
10	Full Range Stereo Vision System Based On A Single Camera Visual Odometry