Font Size: a A A

Pose Estimation Based On Attention-guided Deep Recurrent Neural Network

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:L X TanFull Text:PDF
GTID:2518306737956519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Self driving and mobile robots have also begun to be used in people's lives with the development of computer vision.A wide range of applications of these technologies require smart devices to perceive their movement through visual sensors.We usually consider this kind of problem of obtaining camera pose through vision,which is usually called Visual Odometry(VO).The traditional visual odometer solves the camera movement through a geometric-based method: first,perform feature extraction,then perform feature matching,and finally resolve the pose through the corresponding pair of matching points.The feature design is very cumbersome,and the matching speed and the accuracy of the feature are in opposition to each other.In recent years,due to the vigorous development of deep learning,some methods for studying visual odometry based on deep learning have emerged,which do not require a cumbersome process which is similar to geometric methods and can realize end-to-end camera pose estimation.This paper proposes a monocular VO framework based on a deep recursive convolutional neural network.Different from the traditional VO,this method directly infers the pose of each camera from the video sequence(continuous multiple frames),with better parallelism.It uses a convolutional neural network to learn the effective feature in the VO problem automatically,and employs a deep recurrent neural network to model the sequential dynamics relationship between the cameras implicitly.Aiming at the problem of poor robustness of visual odometry(VO)in dynamic object scenes,this paper proposes to guide network training by learning time-consistent features using a deep recursive convolutional neural network guided by self-attention.This model can significantly reduce its error in dynamic scenes.This method focuses on universal basic scenes and can alleviate the interference of dynamic features,resulting significant enhancement of model's robustness.Finally,this article describes the composition and principle of the model in detail,and compares it with other methods.In this method,the saliency map visualization verifies that the attention mechanism can be used to guide the network to more attention on geometrically universal scenes and objects.
Keywords/Search Tags:Computer vision, Visual odometry, Deep learning, Feature matching, Pose estimation
PDF Full Text Request
Related items