Font Size: a A A

Research On Visual SLAM Kev Algorithms Based On Image Semantic Information

Posted on:2021-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:W ZhangFull Text:PDF
GTID:1368330602490080Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the research field of mobile robots,Simultaneous Localization and Mapping(SLAM)is a key technology for autonomous Localization and path planning of robots,while image-based visual SLAM has been further studied due to the richness of collected data and the universality of application scenarios.At present,because of the lack of high-level semantic information of images,visual SLAM framework based on the multiple view geometry and nonlinear optimization methods cannot satisfy the requirements of intelligent and interaction of people already.While semantic SLAM methods which combine the visual SLAM and image semantic show broader application prospects than traditional methods of visual SLAM,and gradually get the attention of researchers.The content of this paper is how to integrate the semantic information of images with the traditional visual SLAM methods.With the help of image semantic information,the effects of traditional visual SLAM methods can be improved,and the robot also can better recognize the location and understand the environment.Thereby improving the human-computer interaction capabilities of intelligent robots and better serving humans.Based on computer vision technology,the purpose of this paper is to research the algorithms combining image semantic information in visual SLAM.The research contents of this paper is divided into four parts:1)image feature representation method based on image semantic information fusion and feature matching method based on real-time semantic segmentation are adopted to calculate camera pose based on matching results;2)on the basis of the local and global semantic information of images,the loop detection method framework with convolutional neural networks is constructed.The loop detection method determined the loop relationship according to the similarity score between the input images of the network,and the camera pose error is corrected accordingly;3)estimate the depth information of images,build the CNN-based CGAN network architecture,and the traditional triangulation method is combined to estimate the depth value of the pixels in the image;4)build 3d environment map based on the work of previous sections,map image pixels into 3d reference space,and build dense 3d point cloud semantic map according to the semantic information of 3D points.The specific research work and innovations of this paper are as follows:(1)For the feature extraction and matching problems in visual odometry,this paper proposed a new visual odometry method based on image semantic information fusion.In traditional visual odometry methods,image features are extracted to calculate the poses of the camera.In this process,the image semantic information is lacking as supervision,which leads to large calculation error.In this paper,the feature description vectors of images are generated by intercepting the multi-scale patches around the image features then the semantic information is extracted and merged through the convolutional neural network,which is more robust than traditional methods.While in the process of feature matching,a feature matching method based on feature semantic consistency is proposed.The semantic labels of feature points are obtained by real-time semantic segmentation method.Features whose semantic labels are inconsistency will be discarded when matching,by which the feature matching method is improved instead of just calculating distances between their description vectors.Finally,according to the feature matching results and the RANSAC method,the camera poses can be calculated,and the camera's motion trajectory is then obtained.(2)In order to correct the cumulative errors in the calculated camera poses,this paper proposed a loop closure detection method based on image multi-scale information fusion using the Siamese network.The purpose of loop closure detection is to eliminate the cumulative error in the camera poses,so that the camera pose is more accurate.The main research content of loop closure detection is how to represent images accurately and comprehensively and calculate the similarity between images.According to the problem that the traditional visual Bag-of-Words model calculate the image similarity scores relying on image local features which leads to inaccurate results,this paper improved the traditional Siamese network structure by building two pairs of VAE and ResNet respectively on each branch of the network for parallel processing.In the Siamese network,the VAE is used to extract the local feature semantic information of images,and ResNet extracts the semantic information in the scale of global images.Finally,the fusion of the two scale information is obtained to obtain the fusion description vectors of images which are used to calculate their similarity scores.The experiment results of the proposed method on datasets showed that the proposed method was superior to the traditional BoW loop detection method,and can correct the camera poses effectively.(3)For the pixel depth information which is needed in 3D map construction,this paper proposed an image depth estimation method based on CGAN and triangulation.Depth information of images is essential when building a 3D map.Based on the close relationship between image depth information and camera parameters,this paper proposes a method based on Conditional Generative Adversarial Network(CGAN)combined with triangulation method to estimate image depth information.Through the error optimization between the estimated results and the real depth values and the discriminator's outputs of the two results,the image depth estimation results are obtained,then for the scale drift problem of depth estimation results,the real depth values of images are calculated by feature matching based on their adjacent images,which are used to fine-tune the estimated results of CGAN.Proposed method improves the shortcomings of other depth estimation methods based on deep convolutional neural networks,which lacks the supervision of effective information.In the depth estimation process,the supervision of the discriminator and real depth scale information are simultaneously used,which makes the depth estimation results more accurate.(4)This paper proposed a 3D dense point cloud map construction algorithm based on point cloud semantic information.According to the results of the work in previous sections,after obtaining the poses of the camera and the depth values of images,the mapping positions of the pixels in the 3D reference space can be calculated based on the camera model,and the 3D points are formed as a dense point cloud map.The method performs a k-d tree-based data organization form for 3D points in the space,which facilitates regional query of 3D points.According to the results of image semantic segmentation in visual odometer,an outer points removal method based on spatial semantic density is proposed,which divides 3d spatial points into core points and boundary points based on point cloud semantic density.Moreover,when the size of point cloud map is too large,the points can be filtered based on the semantic information of 3d points,and the 3d points belonging to the background or the object categories that are negligible in current task may be deleted,only the interest regions are reserved,thus the growth speed of point cloud can be controlled..
Keywords/Search Tags:Image semantic information, visual SLAM, visual odometry, loop closure detection, depth estimation
PDF Full Text Request
Related items