| In recent years,with the development of technologies such as autonomous driving and artificial intelligence,mobile robots have gradually entered people’s lives.Simultaneous Localization and Mapping(SLAM)is a key technology for realizing the autonomous movement of mobile robots.It can achieve self-positioning and map building through the information collected by its own sensors in the absence of environmental prior information.Visual SLAM equipped with visual sensors(such as cameras)has gradually become the mainstream solution due to its low hardware cost and powerful environmental perception ability,which can achieve stable operation in simple static scenes.At present,the mainstream visual SLAM systems extract point features in the environment to achieve subsequent positioning and mapping.However,they do not make full use of the semantic information that visual sensors can easily extract,making it difficult to support more advanced interactive tasks.In reality,there are often a large number of dynamic objects in the indoor robot operating environments,which can cause feature mismatches and lead to tracking loss and relocation failure,affecting the accuracy of positioning and mapping.At the same time,the indoor environment is mostly man-made scenes with weak texture,and there are often sparse point features,which will also lead to poor positioning accuracy of the visual SLAM system that depends on point features.In order to improve the positioning accuracy of mobile robots in indoor dynamic environments and further utilize semantic information in the environment,this paper deeply studies dynamic feature removal methods and extends semantic information to the global SLAM process to construct a semantic visual SLAM system for indoor dynamic environments.The main research contents of this paper are as follows:1.Aiming at the problem that visual odometry is prone to mismatch in dynamic scenes,this paper proposes a geometric-semantic collaborative dynamic feature processing algorithm that combines multi-view geometry with deep learning methods in the feature matching stage.This paper proposes a two-stage dynamic feature processing algorithm to improve existing dynamic SLAM algorithms by adding a parallel object detection thread.In adjacent keyframes,dynamic feature pre-detection is performed through geometric constraints,and images that meet the set threshold are sent to the object detection network for processing to filter out dynamic features.In addition,an adaptive keyframe selection strategy and an adaptive feature point expansion strategy are proposed for the situation that feature points are insufficient after dynamic feature processing in continuous high dynamic scenes,which further improves the robustness of the system.Through experiments on TUM Dynamic Objects dataset and real scenes respectively,the results show that compared with ORB-SLAM2-a classic algorithm in the field of visual SLAM-this paper’s algorithm has significantly improved positioning accuracy in dynamic scenes;compared with existing dynamic SLAM algorithms such as DS-SLAM and DynaSLAM,our system have better comprehensive performance.2.Leverage the semantic information perceived by the vision sensor on a global scale.The semantic information perceived by the visual sensor is used to assist loop-closure detection.In the dynamic feature preprocessing stage,the semantic and position information of the target in the keyframe is extracted,and the recognition database is established to form the mapping between the keyframe and the recognition data.In this paper,an image representation method of weighted fusion image local features and overall semantic information is designed,which improves the bag-of-words model of the traditional visual SLAM system,and in the loop closure detection thread,the image similarity between the current frame and the loop closure candidate frame is calculated by using the local features extracted by the bag-of-word model and the global object information extracted by the neural network,and the weighted model is designed to obtain the weighted similarity score.Through testing on OpenLORIS Scene dataset,it can be seen that this paper’s proposed image representation method has higher accuracy and recall rate than traditional bag-of-words model. |