Font Size: a A A

Research Of Semantic SLAM Algorithm Based On Stereo Vision

Posted on:2021-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhangFull Text:PDF
GTID:2558306917984109Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Visual Simultaneous Localization and Mapping is the key technology of localization and navigation for mobile robots.Now it is widely used in intelligent robots,autonomous driving and Augmented Reality.In traditional visual SLAM,the structure of scene is usually represented as a set of three-dimensional points,which contain insufficient information and have weak discrimination.Semantic SLAM is a research which uses semantic information obtained from object recognition to improve the performance of traditional visual SLAM and build a map with environmental understanding.In this paper,a real-time semantic slam system is built,which can accurately remove dynamic objects,and use the topological relationship between static objects to achieve high localization accuracy.The sensors in visual slam are monocular,stereo and RGB-D.RGB-D camera is active ranging and sensitive to light,so it is not suitable for outdoor application.Monocular and stereo camera are passive ranging based on triangulation,but monocular cameras can not get the true scale of the environment.The stereo camera does not have the above disadvantages,and it is the closest to the human eyes’ visual observation habits.Therefore,this paper uses stereo camera to build the visual slam system,rectify stereo images and obtain depth by stereo matching.The acquisition of semantic information is a classification and recognition problem,which is usually realized by object detection or semantic segmentation based on deep learning.This kind of methods run in real time on GPU,which cannot be directly applied to visual SLAM system which runs in real time on CPU.In this paper,a multi-object tracking algorithm based on optical flow tracking and template matching is designed and combined with the object detection algorithm based on depth learning,the bounding box of objects in each frame can be acquired on CPU at the frame rate of 19fps.Moving objects will seriously affect the localization accuracy of visual SLAM.The common method is to use object detection to identify the objects which have the properties of motion,and to eliminate them in tracking and localization.However,the method above is too violent,which remove many objects which are not moving in reality.If static objects are removed in a large amount,then too few feature points participate in localization,resulting in serious drift.In this paper,the objects which have the properties of motion are first identified according to semantic information,and then the real moving objects are further distinguished and eliminated by using epipolar constraint.The experimental results on the KITTI dataset show that the localization accuracy of our algorithm is 13.35%higher than that of ORB-SLAM2.After removing dynamic objects,the position of the static object and the relative relationship between static objects are fixed,which is a strong constraint relationship in threedimensional world.In order to reduce the overall error function,the adjustment direction of each feature is independent,which is easily affected by noise points or mismatches.This paper distinguishes the features belonging to different objects by semantic information and semantic data association,and gives structural information to the features belonging to the same object.So that in the process of pose optimization,the features of the same object have optimization consistency,and the structure of the object is still maintained when adjusting the camera pose.Furthermore,the adjustment of each object is independent,so we introduce the constraint that the topological relationship between objects is invariable,so that the adjustment of each object is mutually constrained,and the overall structure of the scene can be taken into account in pose optimization,which can reduce the drift of localization and the accumulation of error.The experimental results on the KITTI dataset show that the localization accuracy of our system is 10%higher than that of ORB-SLAM2.Finally,this paper summarizes the research work and looks forward to the future research.
Keywords/Search Tags:Semantic SLAM, Stereo Vision, Multi-object tracking, Dynamic scene, Topological relationship between objects
PDF Full Text Request
Related items