Font Size: a A A

Object Depth Estimation Based On Visual SLAM

Posted on:2020-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:A J WangFull Text:PDF
GTID:2518306215954539Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Simultaneous Localization and Mapping(SLAM)is the core technology of robot vision.It is often used for environment perception and navigation during robot motion.It is the key to realize fully autonomous mobile robots.As a sensor,the visual camera has the advantages of large amount of information,high flexibility and low cost,which makes the research of vision-based SLAM very important.However,in practical SLAM applications such as robot navigation,obstacle avoidance,and autopilot,the two-dimensional image acquired by the visual camera cannot provide the accurate three-dimensional position,size,direction,etc.due to the lack of depth information,which is greatly limited in application..Therefore,depth information estimation based on two-dimensional images is of great significance for the application of visual SLAM.This paper focuses on how to obtain depth information directly from a two-dimensional image.Compared with traditional algorithms,Convolutional Neural Networks(CNN)obtains the best depth estimation performance by learning a nonlinear prediction function and mapping the image directly to the depth map of the scene.Most of the recent work has obtained depth by supervised training on neural networks.The experimental results also prove its effectiveness in view depth estimation,but such methods are limited to the acquisition of a large number of images and their corresponding pixel depth ground truth.Therefore,the algorithm proposed in this paper is unsupervised learning,and some traditional image processing methods in the field of computer vision are integrated into the deep learning framework without depth ground truth,which realizes the performance beyond the current classical algorithms.The main innovations and contributions of this paper are summarized as follows:1.For the monocular video sequence,an unsupervised learning framework combining view synthesis and perceptual loss is proposed.In the conversion network trained for depth estimation tasks,combining the low-level pixel information error loss with the perceived loss of the advanced feature extracted by the pre-trained loss network as a total loss function,feedback adjustment of the neural network.The experimental results show that the network framework has achieved significant performance improvement in single image depth prediction evaluation.2.For the scale ambiguity problem inherent in monocular video sequences,a solution for joint learning using binocular stereo video sequences is proposed.By using the known pose between stereo image pairs to solve the depth estimation scale ambiguity problem,the single view depth estimator and the pose estimator are simultaneously trained to limit the scene depth and camera motion to a common real world scale.At the same time,through the pose network,there is no scale ambiguity pose estimation between frames,which provides a good initial pose and optimizes the pose for DVO.Finally,the spatial and temporal consistency constraints are used to jointly optimize the depth and pose.3.Based on the single-and-binary depth estimation work,the Adversarial Learning in the Generative Adversarial Network(GAN)is introduced to further optimize the depth estimation and visual odometry.The depth and pose estimation network are combined as a generator,and a convolutional network is combined with the Flatten operation as a discriminator.The loss function of the discriminator is a modified version of cGAN(Conditional GAN)and WGAN-GP(WGAN Gradient Penalty).The traditional classifier's binary task is transformed into a regression task for processing,which is more suitable for the characteristics of the depth estimation task.At the same time,the rationality of the depth map is verified by the application of the viewpoint synthesis,and a good subjective effect is obtained.
Keywords/Search Tags:Deep learning, Visual SLAM, Depth estimation, Unsupervised learning
PDF Full Text Request
Related items