Font Size: a A A

Indoor Visual Localization Based On Scene Coordinates Regression

Posted on:2021-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:C M LiFull Text:PDF
GTID:2428330629485315Subject:Photogrammetry and Remote Sensing
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of Artificial Intelligence(AI),advanced technologies such as mobile robots,Augmented Reality(AR)and Virtual Reality(VR)are all within reach.The system needs to get the sensor pose information.As an efficient method,Visual localization technology not only can still work when other localization methods like the Global Position System(GPS)and WLAN fail in the indoor environment but also has the advantages of rich information and low cost.It has become a hot issue in computer vision.The key of visual localization is to establish the correspondences relationship between the image and the map.The traditional visual localization methods use the sparse feature to realize it,but the localization accuracy is still low under the scene which has the viewpoint change and appearance change.Meanwhile,with the popularization of localization services,the demand for map update is increasing.To solve the above problems,this paper proposes an indoor visual localization method based on scene coordinates regression.Give an image,for each pixel in it,the convolutional encoder-decoder network can infer the corresponding 3D scene coordinate.Then an improved Random Sample Consensus method(RANSAC)is used to estimate the pose.Besides,after adding reliable prediction 2D-3D matches to the training set,we apply the incremental learning method to fine-tune the scene coordinates regression model and to update the map.In this paper,firstly,the current techniques are introduced.Secondly,the related theories of visual localization are studied.Lastly,the work we finished to improve the problem of traditional visual localization methods is as follows:Aiming at the weakness of artificial features in visual localization,a fully convolutional encoder-decoder network is proposed,which can regress a scene coordinate corresponding to the pixel in the query image for pose estimation.This method maps the 3d coordinates corresponding to the 2D pixel in a color image into the BGR values of a scene image.In this way,the correspondences between the image and the map are directly established,without the feature detection and feature matching processes.In the offline stage,the correspondences relationship got by training the network.Then the model is fine-tuned using re-projection errors.The experimental results show that the scene coordinate regression model can take advantage of the image information,achieve the dense prediction of matches between pixel points and scene coordinates,and improve the localization accuracy.To update the map in the long-time localization task,we design a strategy using image Co-visible information.Inspired by Simultaneous Localization and Mapping(SLAM),this strategy captures the structural information got by localization for updating the map.It searches for the Co-visible 2D points between the query image and training images,then calculates the errors of pose and scene coordinates to evaluate the accuracy of the predicted correspondences got by the former trained model.After adding high-reliability correspondences to the training dataset,the new correspondences relationship is learned by the progressive training strategy on the new data,finishing the map update task.The experimental results show that the progressive training strategy with Co-visible information can learn the structural information of the expanded scene effectively and realize the map update.
Keywords/Search Tags:Visual localization, Pose estimation, Map update, Deep learning
PDF Full Text Request
Related items