Font Size: a A A

Research On 3D Semantic SLAM Based On Depth Estimation With P~2Net

Posted on:2023-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhuFull Text:PDF
GTID:2558307100469574Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
Simultaneous Localization and Mapping(SLAM)is one of the most important technologies of mobile robots.Recently,vision-based SLAM technology has been widely used.Current mainstream visual SLAM technologies tend to build sparse maps in real time.However,the reconstruction of dense map and semantic map is indispensable for robots to complete high-level tasks in the future.At present,SLAM for dense and semantic mapping often has certain requirements on visual sensors: stereo camera or RGB-D camera,which greatly limits its application scenarios.In this paper,we proposed a Monocular 3D semantic SLAM algorithm based on depth estimation,which is named MMF-SLAM(Monocular Mask Fusion SLAM).The image depth estimation thread and loopback detection thread are added on the basis of a sparse direct tracking thread.The algorithm uses RGB frames as input to reconstruct 3D map with semantic information,which has high positioning accuracy.And this algorithm lays a foundation for mobile robots to complete higher-level intelligent tasks,and its specific research is as follows:Firstly,we proposed a visual odometer framework for monocular dense reconstruction: on the basis of tracking thread,depth estimation of keyframe is carried out.And the results of depth estimation and camera tracking are optimized.The inaccurate depth estimation points are eliminated by calculating the reprojection error according to the co-viewing relation of keyframes.Then on the basis of the camera pose obtained by the odometer with the direct method,the iterative closest point(ICP)based on nonlinear optimization is carried out from coarse to fine,which obtain more accurate pose of camera.Secondly,surface reconstruction and semantic segmentation are improved based on the existing algorithms.In the aspect of surface reconstruction,the surface elements(surfels)are denoised and the mesh is redivided,which solves the boundary discontinuity and other problems of 3D reconstruction reasonably.In terms of semantic segmentation,Refine Mask algorithm improved on Mask-RCNN is adopted.In order to obtain more accurate mask boundary,it is more conducive to the reconstruction of 3D semantic map.Finally,a lightweight feature matching loop closure detection(LFM-LCD)algorithm framework is proposed to solve the problem of accumulated error and the inaccuracy of LCD algorithm in visual SLAM.This algorithm includes three steps: target detection of key frame,binary classification and feature matching.And another lightweight convolutional neural network is proposed.Dilated convolution combined with the self-attention of Transformer can obtain a higher receptive field,which can better complete the feature matching task between similar keyframes.Then the category vector of keyframe is obtained by binary classification.The dynamic weight of labels is added according to the category proportion in the binary classification tree.Compared with several advanced LCD algorithms,the LFM-LCD proposed in this paper has higher advantages in indoor robot low-speed SLAM.
Keywords/Search Tags:Visual SLAM, 3D Reconstruction, Loop Closure Detection, Deep Learning, Depth Estimation, Instance Segmentation
PDF Full Text Request
Related items