Font Size: a A A

Research On Key Technologies Of Monocular SLAM Based On Deep Learning Method

Posted on:2021-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z DingFull Text:PDF
GTID:1368330614967709Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the improvement of people's living standards,the demand for intelligent mobile robot applications is increasing,such as aerospace,living services,transportation and entertainment industries.As the key technology for autonomous navigation of intelligent mobile robots,Simultaneous Localization and Mapping(SLAM)technology has attracted much attention.Among them,visual SLAM has also made great progress in recent years.Whereas monocular SLAM has become the main research object of visual SLAM due to its advantages such as simple structure,low cost,strong flexibility and strong expandability.However,traditional monocular SLAM based on feature matching still has many problems in practical applications.For example,when faced with low textures and repeated textures,it is difficult to extract effective features for matching,what's more,it is easily affected by the changes of light or weather.In comparison,features extracted by deep learning are more robust to these scenarios.Secondly,the traditional methods emphasize modeling the geometric characteristics existing in the scene,and not perform high-dimensional perception of the scene.Introducing deep learning technology into monocular vision SLAM,can obtain better scene perception capabilities,and thus obtain a more robust monocular SLAM system.Against this background,this paper proposes a research topic on the key technology of monocular SLAM based on deep learning.In monocular SLAM,the main functional modules include pose tracking,mapping,relocalization,and loop detection of the camera.In addition,rich scene-aware information is also crucial for monocular SLAM.Traditional feature-based monocular SLAM relies on feature extraction and matching when relocating.In this paper,the features for scene localization are firstly learned through deep networks,then RGB images are directly put in trained neural networks for localization.When monocular SLAM builds maps,the sparse point clouds corresponding to the feature points are obtained,which cannot well reflect the correlation between the complete structure of the scene and the objects in the scene.Dense depth map and object semantic information of the scene are very helpful for real dense reconstruction,and these informations can also promote the monocular SLAM's perception of the surrounding environment.Therefore,this paper also uses deep learning technology to get a better acquisition of depth and semantic information.The main contrubitions and innovations are summarized as below: 1.For the location of known scenes,this paper proposes an hourglass network based on dual-stream information sharing for camera relocation,which effectively improves the camera'srelocation accuracy.This method avoids the usual processing of R and T of the camera poseas a regression vector,performs regression through two different decoding networks,and usesskip connection between the two decoding branches to perform the corresponding taskinformation.In addition,this paper makes the fixed multi-loss function balance factor into alearnable parameter,then the loss of different tasks can be automatically balanced throughthe network training.Compared with other similar methods,the algorithm proposed in thispaper can effectively improve the relocation accuracy on public datasets,and provide thepossibility for further rapid relocation in large scenes.2.For the problem of scene depth estimation,traditional artificial designed features requirestrong priori assumptions and cannot be applied in complex situations.This paper proposestwo versions of monocular depth estimation methods based on deep learning technology.Thefirst version uses the structure of spatial pyramid pooling to extract multi-scale informationand enhance the accuracy of depth map.The second version uses an improved spatial pyramidstructure to extract multi-scale information of features in parallel through atrous convolution.In order to obtain robust depth output,this paper proposes a new loss function that fusesuncertainty estimates,and jointly optimizes the output of the entire network with estimateduncertainty in a training process.Experiments verify the effectiveness of the method proposedin this paper.The performances in depth related dataset is much improved than previousmethods,and the obtained depth map is also more refined.In addition,this article alsoproposes a point cloud fusion scheme that fuses the depth estimated by CNN and thecorresponding uncertainty into SLAM,effectively decreasing the scale uncertainty ofmonocular SLAM,as a consequence,it obtains better dense 3D reconstruction results,andresults in a more robust visual SLAM system.3.For the challenge of scene structure estimation,many different tasks use different networkstructures,however it will consume more hardware resource in actual applications,so a multi-modal joint estimation method is necessarily proposed.The scene structure can be estimatedsimultaneously using the joint estimation network of shared features.The integration ofmulti-scale network modules has achieved consistent performance improvements.Besidesthat,this paper also combines a lightweight network structure which uses a more efficientconvolution method to get better real-time performance.Experimental results verify theeffectiveness of the improvements proposed in this paper.Compared with other similarmethods,the multimodal joint estimation method proposed in this paper only needs lesstraining data,but can achieve comparable performance and better visualization results amongthese methods.Based on RGB input,this paper also proposes a joint estimation scheme basedon RGBD.the performance of monocular depth estimation and semantic segmentation isgreatly improved just with a small amount of depth input.The final high-precision depthperception and semantic segmentation results can provide more scene information formonocular vision SLAM,and promote the intelligent mobile robot to run robustly in realscenes.
Keywords/Search Tags:camera relocalization, convolutional neural network, depth estimation, uncertainty measurement, 3D reconstruction, semantic segmentation, multi-task learning
PDF Full Text Request
Related items