Font Size: a A A

Research On Monocular Visual Localization Algorithm Based On Joint Self-Supervised Learning

Posted on:2024-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:R Q LiFull Text:PDF
GTID:2568307079954409Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual localization technology can extract the absolute or relative position of a moving camera from the image or video information obtained through a camera,and is a core technology in the fields of autonomous mobile robots and autonomous driving,with significant academic research and engineering application value.As an important visual localization technology,visual odometry(VO)incrementally constructs the global trajectory of the camera’s motion by estimating the inter-frame motion of adjacent images.Traditional geometric-based VO methods require a rigorous camera calibration process and heavily rely on handcrafted feature detection and matching for localization accuracy.While deep learning-based monocular VO methods do not require handcrafted features,their accuracy and generalization ability are limited by factors such as backbone network structure and dataset,and suffer from scale uncertainty that all monocular VO faces.This thesis focuses on addressing the above existing issues of monocular VO and conducts research with the following main contributions:1.Designing and validating a supervised monocular VO algorithm SwinVO based on global-aware optical flow.SwinVO addresses the problem that traditional methods require handcrafted features,and deep learning methods cannot have a global receptive field in the feature extraction stage due to the limitation of backbone networks.The optical flow extraction module is designed based on local pixel motion consistency,and the hierarchical perception of global optical flow is constructed using a mobile windowbased self-attention mechanism to output high-precision inter-frame camera poses.Compared with the GFS-VO,SwinVO has improved by 41.65% and 49.56% in translation and rotation on the KITTI dataset,respectively,and maintains good generalization ability in cross-dataset validation.2.Based on the supervised algorithm SwinVO,a self-supervised monocular VO algorithm Un SwinVO is designed and validated by jointly using optical flow and depth.UnSwinVO addresses the problems of high dataset acquisition costs,overfitting,and scale uncertainty commonly encountered in supervised algorithms.By introducing a depth estimation module,the optical flow extraction and pose prediction modules jointly predict inter-frame optical flow and camera poses to perform differentiable view synthesis,using an adaptive mask to exclude the region assumed to be disturbed by static environment,and finally constructing a self-supervised signal.In the training stage,multiple networks are simultaneously trained in a self-supervised manner without the need for ground truth,and multiple networks can be used separately during testing.In terms of pose prediction,Un SwinVO improves the translation and rotation accuracy by41.79% and 56.57%,respectively,compared with the self-supervised method CADepth on the KITTI dataset;in terms of depth estimation,the prediction accuracy is as high as98.4%,even outperforming methods using target images for training,such as UnOS,and has good generalization ability.In summary,this thesis progressively designs two sets of deep learning-based monocular VO algorithms,evaluates the positioning accuracy,depth estimation accuracy,and generalization performance of the algorithms through a large number of qualitative and quantitative experiments,and demonstrates the feasibility and advancedness of the proposed algorithms.
Keywords/Search Tags:Visual Odometry, Optical Flow Estimation, Depth Estimation, Self-Attention
PDF Full Text Request
Related items