Font Size: a A A

Global Features Learning And Absolute Scale Recovery In Monocular Visual Localization

Posted on:2023-11-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:1528307316950929Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Ege-motion is the basis for mobile robots to improve their intelligence,and it is one of the urgent problems to be solved while robots dealing with various tasks.Autonomous visual localization methods of mobile robots can be classified into global absolute localization,which is map-dependent,and local incremental localization,which is map-independent.Absolute monocular localization requires a geo-tagged map.The absolute ego-motion is attained through the environment image feature matching and relative pose computation for the specific features.In visual global absolute localization,image feature extraction and matching are the basis for realizing the upper-level tasks,which include traditional hand-crafted feature methods and deep feature methods based on convolutional neul networks.However,traditional features require not only manual selection and design but also have poor robustness to drastic light changes,large rotation amplitudes,etc.The proposed deep features in recent years can represent the deep semantic information of images effectively,and have better robustness for changes such as lighting and rotation.However,deep feature models are hard to interpret and difficult to run in real time.Different from the dependence on an environment map,monocular incremental localization can localize the camera by pose accumulation,so it can be utilized in strange environments,which is a good complementary method to global absolute localization.Considering some advantages of monocular cameras,such as simple calibration process,low cost,and easy installation,incremental localization often adopts a monocular camera structure,which is also known as monocular visual odometry.One of the challenges of monocular visual odometry is the scale problem,which includes scale uncertainty while calculating 3D motion from 2D images and the scale drift problem caused by the absence of absolute observation,respectively.To deal with the scale problem,many researchers use the given absolute metric information,such as camera height,to recover the scale of the monocular visual odometry.Road detection and modeling are necessary to compute the relative camera height between the camera and the road plane during robots’ motion.But the current road detection and modeling solutions have different kinds of disadvantages:Firstly,color information processing is usually less robust than geometry information and is also easily affected by shadows.Secondly,the solution based on the region of interest cannot use the image information fully and also easily introduce more errors while occlusions appear in front of the vehicle.Thirdly,deep learning methods tend to have certain requirements for training datasets and a high command of computation resources.In summary,one of the challenges of visual absolute localization is feature representation,and one of the challenges of local incremental localization is scale recovery.In this paper,the following improvement methods are proposed for each of these two problems.For the deep feature extraction and optimization problem in monocular absolute localization,an deep feature remapping method based on a pre-trained network is proposed considering the correlation expression and redundant information elimination between features.By reducing the dimensions of deep features with our algorithm,while using kernelization and normalization methods to process the distance matrix,this deep feature remapping algorithm not only saves computational resources but also reduces the false matches caused by redundant information.The algorithm is proved to be effective qualitatively on a cross-seasonal large-scale scene dataset Norland.Meanwhile,the improvement of image retrieval accuracy and the optimization of computational resources is verified by qualitative and quantitative experiments on the Revisited-Oxford and Revisited-Paris datasets.As for the other research in this paper,monocular visual odometry,we proposes three solutions.1)Scale recovery based on road geometric constraints:We find that the robustness of the geometry information is better than the color information.Then,different from the region of interest methods,the information can be utilized fully while detecting and selecting the whole image.Therefore,we propose to replace the color information with geometry information.With the distance between the camera and the road plane as the reference,we compute the road model by detecting and selecting the road feature points with several geometric constraints.The scale factor of each frame is recovered with known camera height.The experiment results on the KITTI dataset prove that our algorithm performes well in different sequences.2)Scale recovery based on absolute reference autonomous modeling:In order to improve the adaptability of the monocular visual odometry method in complex and variable scenes,this study explores how to model and measure the stable regions of the scene.In this paper,the image is divided into multiple raster regions,the absolute depths of feature points corresponding to different rasters in the image are directly modeled probabilistically,and the stable metric information is selected from different references.The stable regions in the scene are modeled and characterized in three different ways,and the manual modeling method is replaced by the autonomous learning method based on the environmental reference information.3)End-to-end scale recovery based on deep learning:It is observed that the deep learning-based monocular visual odometry datasets for road-wheeled vehicles are currently few and there are also certain requirements for the acquisition criteria of the datasets.To reduce the dependence on datasets and optimize the learning target of the deep learning-based monocular visual odometry method,this paper re-models the motion model of road wheeled vehicles with a new perspective of the motion characteristics of road wheeled vehicles,such as hardware structural constraints and dynamics constraints.By focusing the vehicle motion on the main motion axes and reducing the redundant vehicle motion through motion decoupling,a lightweight end-to-end visual odometry system based on deep learning is designed.Based on the research and exploration of the above issues,the outstanding contributions and innovations of this thesis include the following four aspects:1.A deep features remapping algorithm based on a neural network is proposed.We extract original features from a pre-trained neural network that is combined with the advantages of deep features.The principal component analysis is applied to remap and optimize the deep features.Then,we adopt a sequence frame retrieval method to replace the complicated data association graph.Besides,the distance matrix is morphologically processed to find the best matching sequence online.Through testing under different seasonal environmental conditions,it is qualitatively demonstrated that this method can guarantee visual localization accuracy while saving computing resources.At last,the dimension remapping algorithm is tested on global deep features,SOLAR,which proves that our method can not only save the computation resources but also improve the retrieval precision.2.A geometry-constrained monocular visual odometry scale estimation algorithm is proposed.We try to combine the road points selection and the road model calculation as one problem and optimize them iteratively:road points are detected according to the road geometry model and then the road model is updated according to the verified points.The Delaunay triangulation method is applied to segment the feature points into nonoverlapping triangles by using the feature points as vertices.Further,the points are selected based on the depth consistency and road model consistency strategies.The random sampling consistency method is utilized to obtain the height of the camera mounted on the robot.Then,the camera height is used to estimate the scale information of each frame,and the scale noises are removed by a filtering algorithm.This method is easy to implement with low hardware requirements.And the experimental results show that it improves the accuracy of monocular visual odometry.3.Autonomous learning instead of manual modeling scale recovery algorithm of monocular visual odometry is proposed.A video with an absolute scale is used as input of a monocular visual odometry system to autonomously perceive a stable reference region in the scene,while mathematically modeling the absolute measurement value and stability of the region.Based on the modeling and measurement of the scene stability region,the image is divided into multiple raster regions,and the absolute depths of feature points corresponding to different rasters in the image are directly modeled probabilistically.Through three different methods of scene stability region characterization and stable reference modeling in the scene,the corresponding scale recovery methods are proposed.It is experimentally demonstrated that the probabilistic histogram-based scene modeling method and the corresponding methods can achieve better scale recovery results,and then replace manual modeling to automatically calculate the absolute scale of monocular visual odometry.4.A lightweight end-to-end monocular visual odometry network model based on vehicle model simplification is proposed.The traditional monocular visual odometry method is gradually replaced by a deep learning method according to the characteristics of road vehicle motion.Meanwhile,the ground vehicle motion model is re-modeled by quantitatively evaluating the ground real pose displacement caused by motion focus to simplify the motion estimation model.The secondary motion directions with smaller displacements are calculated,the causes of unexpected X-axis translation are analyzed,and the relationship between X-axis translation and Y-axis rotation are modeled by motion decoupling to reduce the attitude displacement caused by motion decoupling.Finally,a lightweight convolutional neural network is constructed to simplify the 6-degrees-of-freedom vehicle motion model into a 2-degrees-of-freedom primary motion model,which can be trained on a GPU with about 2G of video memory and run in real-time on the CPU.Testing and comparison experiments on the KITTI dataset show that the proposed motion focusing and decoupling method can reduce the training time and the reliance on training data.
Keywords/Search Tags:Image Retrieval, Deep Features Extraction, Monocular Visual Odometry, Scale Recovery, Convolutional Nerual Network
PDF Full Text Request
Related items