Font Size: a A A

Research On Visual SLAM Front-end Based On Multi-dimensional Information Fusion

Posted on:2024-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:X LuoFull Text:PDF
GTID:2542307139958739Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of various disciplines and technologies,as well as the release of national policies such as the "14th Five-Year Plan",China has set off a wave of enthusiasm for unmanned driving.As a key technology for the perception module of unmanned driving systems,visual SLAM can provide real-time pose estimation for unmanned driving systems through continuous image information,while constructing an incremental map based on self-positioning.Visual SLAM is mainly divided into multi-view geometry and deep learning methods according to the implementation method,but both methods have certain limitations.The hybrid method combines the advantages of both,compensating for the shortcomings of each other and achieving complementary advantages.This paper designs a multi-dimensional information fusion-based hybrid visual SLAM front-end,the main content of which is as follows:(1)A hybrid visual SLAM front-end framework based on optical flow is designed.The framework includes a deep learning model,a filter,and a solver.The Lift Flow Net optical flow network in the deep learning model is used to generate dense optical flow in both forward and backward directions.The filter uses the principle of consistency between forward and backward optical flow,combined with a global optimal strategy,to select high-precision optical flow from the dense optical flow as the matching points between frames.The solver uses the matching points to construct 2D-2D epipolar geometric constraints and solve for the inter-frame pose transformation.(2)To address the issues of scale inconsistency and unstable fundamental matrix estimation in the previous framework,a hybrid visual SLAM front-end framework that integrates scene optical flow and depth information is designed.This framework also consists of three modules.However,the deep learning module introduces the Monodepth2 network to predict scene depth information,enabling the prediction of both optical flow and depth information simultaneously.The solver uses the matching points and depth information to establish 3D-2D geometric constraints and solves for inter-frame pose transformation by minimizing the reprojection error through nonlinear optimization.Experimental results demonstrate that this method achieves good accuracy,especially in inter-frame translation calculation.The average translation error and relative translation error across multiple sequences are only 4.466 m and 0.038 m,respectively,with a relative translation error of only 23.19% compared to ORB-SLAM2 and 43.68% compared to Depth-VO-Feat.(3)The previous framework addressed the issue of scale ambiguity but suffered from poor robustness due to its tendency to get stuck in local optima.To improve its robustness,semantic information was introduced,resulting in a mixed visual SLAM frontend framework that combines scene flow,depth,and semantics.This framework,called the multi-dimensional information fusion-based mixed visual SLAM frontend,added the YOLOv5 network to the deep learning module for extracting semantic information,while the filter further removed the matching points of dynamic objects based on semantic information from the high-precision optical flow.Additionally,a validator module was added to the framework,which used Euclidean distance similarity calculations to verify results.In all tested sequences,the average translational and relative translational errors of this framework were only 4.527 m and 0.05 m,respectively,and its average relative rotational error was only 0.081°,which is 94.17%better than the previous framework.Moreover,the relative translational errors of this framework in the 01 and 07 sequences were only 0.199 m and 0.0288 m,respectively.The experiments show that this framework maintains the high accuracy in translation and performs better in many aspects than the previous framework and other methods tested in the experiments.It also has better robustness in low-feature,low-texture,and high-dynamic scenes.To provide support for SLAM researchers,the core of this framework has been open-sourced on Git Hub.(4)A SLAM evaluation software was designed to address the inconvenience in evaluating SLAM algorithms.This software is used for evaluating SLAM performance on the KITTI Odometry dataset and can evaluate SLAM or odometry with multiple sensors.After being launched,the software can automatically perform experimental evaluations,compare multiple methods,and display reports and images.In order to provide relevant support for SLAM researchers,the software has been open-sourced on Git Hub.
Keywords/Search Tags:SLAM, Deep learning, Multi-view geometry, Hybrid, Multi-dimensional information
PDF Full Text Request
Related items