Font Size: a A A

Research On Deep Neural Network Model And Algorithm Based On Visual Odometry

Posted on:2021-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ChenFull Text:PDF
GTID:2518306512487524Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Visual odometry(VO)is a method to obtain the pose of a moving object via visual sensors.It plays an important role in autonomous driving,robot self-positioning,and augmented reality.Due to its low cost and wide application scenarios,it has become a hot topic of discussion and research in the field of computer vision recent years.The traditional geometry-based methods rely heavily on the selection and matching of features.Besides,the traditional monocular VO methods also have the problem of the lack of scale.With the rise of deep learning in,deep neural networks have achieved good results in various visual tasks;so using learning-based methods to solve the VO task has gradually become a new idea.In this paper,a method based on Convolutional Neural Network(CNN)and Long Short-Term Memory(LSTM)is proposed.The framework is an end-to-end neural network.The inputs are continuous monocular RGB pictures,and the outputs are the estimated camera poses.Experiments on the KITTI VO and TUM datasets prove that the method in this paper is more superior in prediction accuracy than traditional geometry-based methods and other deep learning-based methods.The main research work of this thesis is as follows.(1)The visual attention mechanism is introduced into the feature extraction subnet in the visual odometry framework.Two types of visual attention mechanisms for high-resolution and low-resolution feature maps are proposed to improve the feature extraction module.The purpose of adding a visual attention mechanism is to strengthen effective features,and to suppress useless features and noise during feature extraction.Based on this,a visual odometry framework composed of CNN with added visual attention and decoupled LSTM is proposed.Experiments show that the network has higher prediction accuracy than networks and other methods without added visual attention mechanism.(2)Using the context guided feature selection LSTM and bidirectional LSTM to improve the timing modeling subnet in the framework.Based on the feature extraction subnet combined with the visual attention mechanism,we developed two visual odometry network frameworks.The main character of the LSTM based on the context mechanism is to use the hidden state of the LSTM unit at the previous moment to correct the input of the current moment;the main role of Bi-LSTM is to use the past and future information to constrain the unit output at this moment at the same time.Experiments show that deep neural networks combining visual attention mechanism and context mechanism have higher prediction accuracy and smaller error than traditional methods and other deep learning-based methods.(3)A visual odometry system based on deep learning is designed.The input of the system is continuous sequences of pictures,and the output is the pose of the camera at each moment.Users can choose the appropriate pre-trained network to process the data.The system also provides calculation results for various indicators of the estimated data and the ground-truth.
Keywords/Search Tags:Visual odometry, Convolution neural network, LSTM, Visual attention mechanism
PDF Full Text Request
Related items