Font Size: a A A

Research On Semantic Vision SLAM Based On Deep Neural Networks

Posted on:2022-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:J LuFull Text:PDF
GTID:2518306527469944Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision and artificial intelligence technologies,especially after entering the 5G era,indoor intelligent mobile robots have gradually been widely used in fields such as home care,warehousing and logistics,hotel services,etc..And,as a key technology for intelligent mobile robots to achieve autonomous navigation and path planning,the simultaneous visual localization and mapping(vSLAM)based on RGB-D has attracted wide attention in both industry and academia.Traditional vSLAM works based on the strong assumption that the environment is rigid and static.When there are dynamic objects in the environment,due to the fact that vSLAMs which rely only on geometric features such as points,lines,and surfaces for state estimation cannot effectively eliminate the interference caused by moving objects,and their performance on state estimation and mapping are greatly degraded.Meanwhile,due to the lack of semantic information,robots equipped with these vSLAM systems cannot achieve higher-level environmental perception and intelligent interaction.Based on the background above,the system in this thesis adapts a deep neural network trying to optimize vSLAM from several aspects to improve robots pose estimation and mapping performance in dynamic environments and give it a higher capability for interaction.The main research contents are as follows:(1)Optimize the feature matching process.This thesis uses the pyramid optical flow method to track and match feature points,avoiding the calculation and matching of feature point descriptors,which is a time-consuming process.This can improve the system performance to a certain extent in the terms of real-time.(2)Optimize the standard denoising algorithm Random Sample Consensus(RANSAC),propose a multi-stage based RANSAC scheme,and combine the scheme with epipolar constraints to determine the moving status of feature points.Compared with the standard RANSAC,Multi-stage RANSAC performs step-by-step sampling with feature points by executing the RANSAC process with a slightly larger threshold multiple times,and uses the previous sampling result as the total number of samples for the next sampling.On the one hand,the increase of the threshold can reduce the number of floating-point calculations to a certain extent.On the other hand,due to the reduction of the total number of feature points in the next sampling,the total time consumption of the Multi-stage scheme which has reasonable thresholds is more less than that of the standard RANSAC,which improves the real-time performance of the system.In addition,experiments show that the Multi-stage scheme can retain more inliers while removing outliers,making the system more robust in dynamic environments.(3)In visual odometry,the semantic segmentation network is introduced to obtain semantic information,and the semantic information and geometric constraints are combined to more completely eliminate dynamic objects in the environment,which makes up the defects of the traditional scheme,which only relies on low-dimensional geometric feature information and therefore is not easy to obtain contours of dynamic object completely.(4)Construct a semantic octree map that only represents a static environment.First,remove the feature points that fall within the contour of the dynamic object frame by frame using semantic information,and then stitch static feature points according to the camera pose and incorporate the semantic information to generate a semantic point cloud.Finally,a semantic octree map without dynamic objects in indoor environment is constructed on this basis.The system test results show that the semantic Octo Map constructed by vSLAM in the present work lays the foundation for higher-level human-computer interaction and navigation tasks.At the same time,thanks to its light weight and flexibility,the Octo Map can be used for mapping large-scale scenes as well.
Keywords/Search Tags:Visual simultaneous localization and mapping, deep neural network, feature matching, state estimation, semantic mapping
PDF Full Text Request
Related items