Font Size: a A A

Study On Interference Cancellation Of Unsupervised Monocular Depth Estimation

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2428330620976717Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Scene depth estimation is essential to object detection,recognition and understanding in 3D world.It is one of the important tasks of computer vision and is also an indispensable key technology in robot navigation,virtual reality and augmented reality.After comparing the advantages and limitations of lidar,binocular and monocular vision depth estimation,this paper uses a monocular camera as a sensor,which is easy to install,flexible to use and cheap,and proposes an unsupervised learning method based on video sequence to estimate the scene depth information.Its aim is to reduce the interference caused by similar region,illumination change and occlusion,thus improving the accuracy of depth estimation.In this paper,the depth estimation methods based on deep neural network are firstly compared and analyzed,and the key problems of depth estimation via monocular video sequence are clarified.Aiming at interference cancallation caused by similar region and occluded areas,the proposed method eliminates the interference from two aspects by correcting both input image sequence and loss function.The image regions with large interference should not be involved in the network training to ensure the accuracy of depth prediction.In this paper,the difference between the warped depth map from the source(target)to the target(source)and the target(source)predicted depth map is used as the correction weight mask.The larger the difference is,the greater the depth prediction error is,the smaller the weight of these pixels participating in the calculation is.The mask is then used to modify the input image sequence to reduce interference of the depth estimation network and camera pose estimation network.It is also used to correct the image consistency constraint.This reduces the interference from the data source and the objective function of the network respectively,therefore obtain accurate depth estimation.In this paper,the data structure used in network training is changed to eliminate the interference of input data.The weight coefficient obtained from each training is tracked and recorded.Before the image pair is input into the network,the input data is reqired to be corrected by multiplying the weight coefficient.Therefore,two sets of weight masks are designed to represent the matching relationship between the source and the target,and the target to the source.When the same image pair is repeately trained,the program will dynamically update the two weight masks to highlight the areas with less interference pixels in the image,so as to improve the performance of the pose and depth estimation network.The updated processing method can not only eliminate the interference between the two batches of data,but also ensure the elimination of the interference of the whole video sequence due to the progressive relationship between the sequence images and the coupling relationship between pose estimation and depth estimation.Finally,two residual convolution neural networks with auto-encoder architecture are used to implement the camera pose estimation task and depth estimation task.Test experiments are carried out on the Kitti dataset,the largest automatic driving dataset in the world,and the predicted depth maps of typical scenarios are demonstrated,which verifies the correctness of the proposed method.Compared with the current unsupervised depth estimation methods for monocular video sequences,the proposed method also achieves fairly good results.The above test results show that the accuracy of pose prediction and depth estimation is ensured after the interference elimination of illumination change,occlusion and similar region.
Keywords/Search Tags:Monocular depth estimation, Unsupervised, Camera pose, Auto-encoder, Residual convolutional network
PDF Full Text Request
Related items