Font Size: a A A

Research On Visual Odometry Based On Fusion Of Spatial And Semantic Information

Posted on:2022-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:W FuFull Text:PDF
GTID:2518306512476344Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual odometry is a method to estimate the camera trajectory from the frame sequences captured by the camera,which is used to achieve autonomous camera positioning.Visual odometry can be divided into feature method and direct method:the feature method solves the pose by matching the feature points between two frames to establish a geometric model,which has the advantage of high accuracy but requires certain computational resources;the direct method does not need to establish the matching relationship of feature points,and directly uses the photometric error model to optimize the pose,which is faster in computation,but needs to meet the assumption of gray scale invariance,and the accuracy needs to be improved.In this paper,we take the UAV platform with limited computational resources as the research background,and firstly study the lightweight semantic segmentation method,and then propose a visual odometry method based on the fusion of spatial and semantic information using the direct method.Firstly,for the current semantic segmentation algorithms generally have the problems of complex network structure and huge computational overhead,a lightweight semantic segmentation method LitNet based on multi-scale visual feature extraction is proposed,which adopts an overall encoder-decoder network structure.In the encoder part,the basic structure of feature extraction consists of an inverse residual module with fused atrous convolution,and the Mish activation function is used to compensate the feature extraction accuracy,and the obtained feature map is used as the input of the lightweight multiscale fusion module,and the atrous convolution based on different sampling rates is sampled in parallel to capture the contextual information of the feature map using multiple scales;In the decoding part,an up-sampling feature fusion module is designed to up-sample and fuse the high and low level features at the same time to obtain richer semantic and spatial information.Through comparative experimental analysis,the accuracy(Mean Intersection over Union)of the proposed method on the CamVid dataset is 69.01%,and the average segmentation rate is 25.7 FPS.LitNet better balances network real-time and accuracy,and has better practical value and performance effect.Then,a visual odometry method SPSVO that fuses spatial and semantic information is proposed,which consists of building local maps,tracking and sliding window-based optimization.SPSVO is experimentally validated on the NVIDIA Jetson TX2 platform using the TUM-mono dataset.SPSVO is experimentally validated on the TUM-mono dataset:in terms of accuracy,SPSVO is comparable to ORB-SLAM in scenes with slow rotation and translation,and the root mean square error(ATE)of absolute trajectory is only 0.23;in terms of time consumption,SPSVO is significantly better than DSO and ORB-SLAM.In terms of time consumption,SPSVO is significantly better than DSO and ORB-SLAM,reaching 41.5 FPS,and SPSVO better balances the time consumption and accuracy of visual odometry.
Keywords/Search Tags:Simultaneous localization and mapping, visual odometry, feature point matching, semantic segmentation, nonlinear optimization
PDF Full Text Request
Related items