Font Size: a A A

Research On Visual Odometry Algorithm Based On Deep Learning

Posted on:2022-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:W K XuFull Text:PDF
GTID:2518306347983099Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The front-end Visual Odometry(VO)in the Slam(Simultaneous Location and Mapping)framework is a comprehensive research field involving image depth estimation and pose estimation.The overall visual odometer,simply put,is the dynamic electrical signal presented by the accumulation of several frames of pictures combined with the principle of persistence of human eyes;then the changes between consecutive frames further reflect the changes of the camera,How much the camera translates and rotates is reflected by the changes between frames.We need to use a visual odometer to estimate the change of the camera's position and posture.So in summary,visual odometry mainly includes:input image sequence,use artificial means or machine intelligence to extract image features;then use program algorithm to perform feature matching;track the received features,input the image into the network for pose estimation,and finally optimize Feature value and pose value for subsequent mapping.In recent years,the deep learning method that has been loved by people is a good tool for processing visual odometry.This method breaks away from the rigid dependence on camera parameters,and obtains the final result according to the branch network independent or joint training.In view of the shortcomings of traditional algorithms and some existing deep learning,considering the lack of accuracy of the GANVO deep generation network and the long training time of the pose network,this paper thinks of using the end-to-end generation confrontation network as the main framework,based on this The attention mechanism is added and the LSTM is improved to a GRU network to shorten the traming time.This article has completed the following work:1.Following the overall idea of GANVO,conceived how to suitably integrate the existing network framework from the perspectives of theory and feasibility.2,After inserting the self-attention mechanism into the second and fourth layers of the generation network,the attention view is multiplied by the pixel-by-point correspondence to obtain the target feature map of the self-attention network,and the output attention The view continu es to perform feature extraction and learning with the convolutional layer to get more details,and analyze the generated depth map.After several times,the average RMSE is finally 5.437,which is an improvement of 0.011 compared to the original GANVO.3.Improve the original CNN+LSTM network to a CNN+GRU network.Since LSTM and GRU are not much different in nature,GRU has the advantage of only two input gates to reduce the training time(LSTM has three gates).This branch has been repeated several times Take the average final time:9 sequences of 29.048 milliseconds per frame,10 sequences of 25.158 sequences(Comparing the results of many papers,since the path comparison time of the visual mileage calculation method generally only takes the 9 and 10 sequences,the other sequences are not compared).4.Use the camera model and geometric principles to synthesize the fake view.Through the comparison between the generated fake image and the original real image by the discriminator,the parameters of the training network are continuously adjusted,and the output results are fed back to the generation network to further adjust the param eters of the generation network,So that the depth map index reaches a certain accuracy,repeat this process,thereby improving the output accuracy of the entire network.
Keywords/Search Tags:Visual Odometry, Attention module, Gate Recurrent Unit, Camera model, Deep learning
PDF Full Text Request
Related items