The SLAM(Simultaneous Localization and Mapping)algorithm gives the robot the ability to perform localization and mapping in an unknown environment.It is one of the key algorithms to realize the autonomous control and artificial intelligence of the mobile robot.With the development and progress of mobile robot technology,SLAM algorithms have also been widely used in fields such as autonomous driving,virtual reality,environmental exploration,disaster relief,and warehousing logistics.Due to the unique advantages of monocular cameras such as low price,simple structure,and easy deployment,the monocular visual SLAM algorithm has attracted great attention from researchers at home and abroad.The traditional monocular visual SLAM algorithms,which based on spatial geometric relationships,estimate the camera poses and map point locations through filtering or graph optimization and performs well in an ideal environment.However,due to complex situations such as moving objects and sparse textures that are difficult to model in the real environment,these methods cannot be implemented directly to form mature applications.In recent years,with the rapid development of deep learning technology,self-supervised depth estimation methods have gradually attracted great attention.With the ability of extracting high level features,these methods can estimate depth from a single image directly in difficult scenes such as moving objects and sparse textures.Therefore,how to use self-supervised depth estimation methods to improve the estimation accuracy of visual SLAM algorithms has become one of the most cuttingedge research hotspots.Thus,we focus on the fusion strategy of depth estimation model and traditional visual SLAM algorithm.Our contributions are summarized as follows:1.We thoroughly analyzed the existing self-supervised depth estimation framework and clarified the adverse effects of incorrect depth estimation results on model convergence.Then,we proposed a self-improving depth estimation framework based on the teacher-student framework.In this framework,the teacher model and the student model can iteratively improve each other in depth estimation and uncertainty estimation.After training,the student model can predict the depth of the image while outputting the corresponding depth uncertainty,indicating the credibility of the depth estimation results for downstream applications.2.Pointing out the scale inconsistency between the depth estimation model and the traditional visual SLAM algorithm ORB-SLAM3,we proposed a special scale alignment strategy.This strategy obtains reasonable virtual stereo coordinates by modifying the length of the baseline.Furthermore,since the depth uncertainty reflects the accuracy of the depth estimation results,we adopt the depth uncertainty as an adaptive weight to the covariance matrix for bundle adjustment and proposes an uncertainty-aware pseudo RGB-D SLAM algorithm.3.Based on the above research,we organically integrated the self-improving depth estimation framework and the pseudo RGB-D visual SLAM algorithm based on depth uncertainty,into a visual SLAM prototype system.By clarifying the overall architecture and internal implementation process of the prototype system,we provided a hint for the implementation of the monocular vision SLAM algorithm. |