Font Size: a A A

Monocular Visual Odometer Based On Unsupervised Learning

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ChenFull Text:PDF
GTID:2428330623459083Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Accurate estimation of the position of a moving object is one of the important tasks in most computer vision reference scenes.Position estimation is commonly used in computer vision,VR(Virtual Reality),driverless cars,etc.According to the camera used in the system,it can be divided into a monocular visual odometer and a stereo visual odometer.Because of the rapid development of deep learning in recent years,more and more researchers have combined the pose estimation problem with deep learning.The main research object of this paper is an unsupervised monocular visual odometer based on deep learning,which combines depth and pose estimation tasks.In most unsupervised monocular visual odometers,there is a problem of scale ambiguity.The results of the model need to be scaled back to the true scale of the world before comparison with GT.This paper has studied and improved this problem.In addition,the paper also optimizes the cost function of the pose estimation network and improves the accuracy of the model.The specific work content is as follows:(1)This paper introduces some knowledge of CNN,analyses the traditional geometric visual odometer and the current mainstream unsupervised visual odometer two methods,respectively,SfMNet: joint training with depth estimation and pose estimation mission;GeoNet: joint training with depth estimation,pose estimation,and optical flow mission.Then this paper introduces the datasets used in the experiments and their related evaluation indicators.(2)Because of the monocular visual odometer needs to rely on the front and rear frames to calculate the distance between two points in the known space,the absolute depth of the object cannot be determined,so the finally obtained motion trajectory has scale ambiguity.However,the stereos visual odometer calculates the depth information by the baseline between the binocular cameras,so the triangulation method can be used in the same frame.In order to restore the absolute scale in the monocular visual odometer,this paper proposes a neural network framework,which introduces binocular dataset as training data into the neural network,and the monocular data is used as test data to test and evaluate the model,given anadditional binocular reference constraint.The model is applied to a monocular camera,it is also called an unsupervised monocular visual odometer.The evaluation results on the KITTI dataset show that the model in this paper has restored the absolute scale of the world.(3)Because of the cumulative error in the unsupervised vision has a large impact on the system,this paper proposes a new pose cycle consistency loss,making full use of the spatial information in the training data to make the object's motion trajectory closing to GT.The robustness of the model and the accuracy of the pose estimation of the camera are improved.In view of the existence of Gimbal lock in Euler angle,this paper also introduces a higher efficiency unit Quaternion.Assume that the input of the pose estimation network is a continuous three-frame left images.In view of the loop closing detection of the direct method in the traditional geometric visual odometer,the estimation poses of the first to second frame,the second to third frame and the first to third frame are respectively obtained.The estimation poses of the moving camera forms a closed loop,which provides a stronger constraint for the model.Comparing the unsupervised VO method of this paper with the SfMLearner method,the proposed method can significantly reduce the translation and rotation errors of the model.
Keywords/Search Tags:Unsupervised Learning, Monocular Visual Odometer, Scale Ambiguity, Pose Estimation, Cycle Consistency Constraint
PDF Full Text Request
Related items