Font Size: a A A

The Research Of Pyramid Pooling Module Based Camera Relocalization

Posted on:2020-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:H LinFull Text:PDF
GTID:2518306464472294Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It is the key of many application technologies to accurately obtain the RGB camera pose in the computer vision tasks,such as automatic driving,robotic,etc,all of them rely on the pose of themselves in the scene to do the correct action and finish the special tasks.Relocalization is one of the most important technology.There are three main solutions for camera relocalization.The first one is to build a database of image which labeled with it camera pose firstly.When we try to local a query image,we will compare the query image with the image in database and get the similar images.And then,we calculate the pose by these images' pose with feature extraction and matching algorithms and pose solving algorithms.The second one os to learn the training data set through a random forest to predict the scene coordinate of image patch.Giving an image,the random forest will predict the scene coordinate of this image directly,with the the 2D-3D correspondence of the scene coordinates,the final pose of camera will be figure out in a RANSAC loop.The last method is similar to the second method.The only difference is that the method predict the scene coordinates by a convolutional neural network.This method can predict good scene coordinate without depth info by building a convolutional neural network and train with a special loss function.Thus the last method has a better adaptability to a certain extent.In order to study the visual relocalisation,starting from the basic content of the camera,we introduce how the camera works,the relationship of three coordinate system which we will talking about,the motion representation in the 3D environment and the solution of camera localization in details.After that,we review the relocalization question and the image-based camera relocalization solutions in recent years.Then the improvement measures were introduced and verified.The conclusion from the relevant experiments is that the random sampling consensus algorithm has a strong robustness.Therefore,we get the conclusion: The key factor determining the accuracy of camera pose estimation is the overall accuracy of the scene coordinate in the whole pipeline.With the purpose of trying to find a good way to predict the coordinate of the scene more accurately,we do a series of experiments,and find that the pre-trained Dense Net-201 with transfer learning which combined with a pyramid pooling module is better at predicting the scene coordinates than the previous works in those experiments,considering the previous solution must have two training steps to get the scene coordinates.In contrast,the angle-based loss will be able to effectively reduce the steps to only one.In order to improve accuracy by care more about the crucial scene content,this paper try to using weighted Angle-based loss function to reduce the necessary training steps and get a better result at the same time.We implement our works on C++ and Torch 7.
Keywords/Search Tags:Relocalization, Dense Net, Pyramid Pooling, Transfer Learning
PDF Full Text Request
Related items