Font Size: a A A

Research On Semantic SLAM System Based On Deep Learning

Posted on:2021-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2518306476450214Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Vision-based simultaneous localization and mapping(SLAM)technology,as one of the key technologies to realize fully autonomous mobile robots,has attracted widespread attention from scholars in emerging fields such as autonomous driving and augmented reality.Combining semantic segmentation and visual SLAM,developing semantic SLAM technology based on deep learning has become an important direction to break through the bottleneck of traditional visual SLAM algorithms.Aiming at the requirement of environment perception and real-time,this thesis proposes a semantic SLAM system based on deep learning,which uses deep learning to realize pose estimation,depth estimation and semantic segmentation and combines data mapping to construct a three-dimensional semantic map.The program is tested on the KITTI data set.Compared with traditional SLAM,the positioning accuracy and the environmental perception have been significantly improved.The main contents of this thesis are summarized as follows.Firstly,aiming at the problem that the traditional visual odometry method has poor robustness in pose estimation under complex conditions such as obvious changes in light and severe camera movement,the visual odometry model based on convolutional neural network is designed to realize pose estimation.The feature extraction network composed of depthwise separable convolutions can extract the geometric information in the feature map.Through the fully connected layer module,the estimation result of the camera pose change of the input image pair is obtained.On the basis,the visual odometry model based on convolutional neural network and recurrent neural network is designed using LSTM network to improve the accuracy of pose estimation.Compared with the widely used monocular visual odometry——VISO2-M,the two models proposed in thesis have significantly improved the accuracy of pose estimation and are feasible alternatives to traditional monocular visual odometry.Secondly,aiming at the problems of scale uncertainty and susceptibility to camera motion in traditional monocular depth estimation methods,two depth estimation models based on monocular SLAM are designed to achieve monocular depth estimation.These two models propose improvements based on Res Net-50 and Mobile Net-V2 respectively.By adding a channel attention module,the feature channels that are beneficial to the depth estimation task are enhanced.And the same feature fusion method and Fast Up-projection up-sampling module are used to achieve the depth estimation map with the same size as the input image.The accuracy of the depth estimation of these models is significantly higher than that of the existing monocular depth estimation models while meeting the requirement of real-time performance.Thirdly,aiming at the problem that the current semantic segmentation networks have too much calculation and cannot meet the real-time requirements,a semantic segmentation network based on a lightweight feature extraction network is designed.The semantic segmentation network uses the Xception model for lightweight feature extraction.Through the channel attention module,receptive field expansion module and multi-level feature fusion,a high-precision and fast semantic segmentation network is designed.On the basis,a multi-task prediction network combing semantic segmentation and depth estimation is proposed.By sharing the lightweight feature extraction network model and adding output branches,semantic segmentation and depth estimation are realized.The multi-task prediction network can obtain quite high semantic segmentation accuracy and depth estimation accuracy in outdoor and indoor scenes while meeting real-time requirements.Finally,a semantic SLAM system based on deep learning is designed and constructed.The system realizes the full integration of visual odometry model,depth estimation model and semantic segmentation model.The system uses the point cloud processing library to construct and filter point cloud data for the optimized pose estimation result,depth estimation result and semantic labels to obtain a three-dimensional semantic map with better performance.
Keywords/Search Tags:Visual SLAM, Deep Learning, Visual Odometry, Depth Estimation, Semantic Segmentation
PDF Full Text Request
Related items