Font Size: a A A

Research On Learning Based Monocular Simultaneous Localization And Mapping Methods

Posted on:2022-02-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Q ZhangFull Text:PDF
GTID:1488306569484264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Simultaneous localization and mapping(SLAM)is an important topic in the area of research in robotics,which focuses on methods allowing robots equipped with sensors,e.g.,camera and Lidar,to estimation the position of themselves with gathering relative information of environment.Recently,visual SLAM(v-SLAM)has attracted much at-tention as driverless vehicle and augmented reality applications rise.Monocular SLAM relies only on image intensity,and estimates the motion while maps the 3D environment according to the multi-view geometry theory.Direct monocular SLAMs track frames by minimizing the photometric error without feature extraction,which are robust to tex-tureless environments and motion blur.But the performance can be affected by image noise,illumination,occlusions and dynamic objects in scenes due to the intensity con-stancy assumption,which causes wrong inter-frame matching and reduces localization and reconstruction accuracy.Compared with feature-based SLAM methods,which can imply bundle adjustment to refine the camera pose,direct methods lack reliable matching of sparse feature points shared by frames.Thus the accuracy of inter-frame motion esti-mation is usually lower than feature-based methods.Machine learning and deep learning have been combined in methods for improving the 3D geometry estimation,e.g.,depth and pose in SLAMs.Except for enhancing the pixel-level motion estimation,camera motion estimation is achieved by learning methods in a simple way compared with the traditional ones,which include inter-frame matching and optimization.However,the robustness and generalization ability of these methods are still limited.Learning-based direct monocular SLAM needs further study for achieving high precision localization.For real-time direct SLAM systems,localization and mapping are separated and run in two parallel threads,where the front-end achieves the inter-frame motion estimation called visual odometry(VO)and the back-end optimizes the motion and reconstructs the scene.Loop closure and relocalization are important components of the back-end,where location candidates are provided by image retrieval based on traditional feature extraction with low accuracy.Moreover,variety of the same location and high similarity of different locations,making the appearance-based visual localization with feature extraction and distance measure a challenging task.Thus the way of integrating representation ability of deep features and improving the accuracy in visual localization for back-end SLAM with learning based methods should be deeply explored.In this dissertation,we aim to develop more precise and robust SLAM system from the front-end inter-frame motion estimation and the back-end visual localization methods based on learning.The contributions can be summarized as:(1)To tackle the low accuracy and unstable performance of the localization in large-scale monocular direct SLAM,we present a Ground Control Points(GCPs)based SLAM method,where a confidence prediction model for stereo matching in direct method is utilized.We improve the depth refinement with different update strategies based on the confidence estimation,avoiding the inaccurate depth prediction in depth fusion.The orig-inal function is regularized using pixels with high confidence value chosen as the GCPs,which up-weights the reliable residuals in optimization.The proposed method improves the localization accuracy and the robustness of motion blur,quick rotation and dynamics,while ensures real-time property.(2)The appearance of a location shows variety due to changes of environment,mak-ing it difficult to extract effective representation for visual localization.Traditional fea-tures based on Bag-of-Word(Bo W)model can not handle complex scenarios of city-scale visual localization,which are replaced by deep features extracted by convolutional neural networks(CNNs)recently,but lack self-adaption.We present a second-order statistics learning method for robust geo-location representation,where the covariance matrix of feature maps is estimated through covariance pooling method.By adopting parametric normalization through several convolutional layers,we compute adaptive shrinkage off-set on different sample covariance matrix which improves the self-adaptation ability of network.The proposed network can be trained by minimizing the triplets ranking loss with hard negatives.The learned robust representation can help recognize the same loca-tion despite the extreme changes of illumination,different viewpoints,partial occlusion,seasonal variation and dynamic objects.(3)Traditional visual localization is casted as image retrieval to find the most similar places in database for back-end SLAM,which requires checking the correspondence to ensure accuracy of candidates.To measure similarity of geo-location representations by robust andefficient distance metric,we explorethe metric learningbased per-location clas-sification combined with Exemplar-SVM(E-SVM).The proposed metric learning based multiple kernel classifier(MLMKC)extends the Gaussian RBF kernel by Mahalanobis distance matrix constructed by sample pairs.To accelerate the convergence,we employ the spectral projected gradient descent for MLMKC(SPG-MLMKC)to reduce the SVM solving frequency.The classifier proposed pays more attention to the discriminative part of each location,which can benefit the computation of similarity and improve the relocal-ization accuracy.(4)To solve the problem of lacking generalization ability within most deep VO,we propose a self-supervised method for deep direct VO(Deep DVO)based on the differen-tiable inverse compositional alignment.An uncertainty network is presented to regularize the original photometric loss for robust estimation.To combine the traditional VO module of motion optimization,the on-line least square optimization is integrated for end-to-end training,and the warp parameter is updated based on weighted Gauss-Newton algorithm.As direct alignment suffers from the bad initialization,we use a regression network for predicting a reliable initial pose to guarantee the convergence.The inter-frame motion is updated with on-line weighted least square solver,which improves the accuracy and gen-eralization ability compared with other deep learning based VO.Besides,super-resolution layer is adopted to up-sample the depth map and so computes the photometric loss on orig-inal resolution,improving the depth estimation on objects boundaries and scene dynamics.
Keywords/Search Tags:SLAM, Random Forest, Metric Learning, Visual Localization, Convolutional Neural Network
PDF Full Text Request
Related items