Font Size: a A A

Research And Application Of High Precision Visual Localization In Indoor Complex Scenes

Posted on:2023-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:N LiFull Text:PDF
GTID:1528307040970929Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and application of the Internet of Things and mobile multimedia,location-based applications have become an indispensable part of people’s lives.In recent years,visual-based indoor localization has gradually become the main way to obtain accurate location information.Visual localization refers to the utilize image retrieval techniques to estimate the positional information of a query image on a map of a known scene.However,visual-based localization technology cannot achieve low computational complexity and high precision localization performance in complex indoor scenes.Indoor complex scenes with a large number of objects obscuring each other,similar structures and large viewing scenes bring many challenges to visual localization technology,especially the three key scientific problems of low accuracy in constructing indoor scenes semantic maps,high complexity and large location errors of visual localization need to be solved.The reasons for the three key problems are that a large number of objects obscuring each other in the indoor complex scenes to make the semantic contours of different objects overlap each other,making it difficult to provide accurate segmentation information of the locatable area for the construction of the semantic map in the scene perception phase in visual localization;the existence of similar structural layout areas in the indoor complex scenes leads to a large number of retrievals and iterations in the query image and scene 3D point cloud alignment phase in visual localization,resulting in the high complexity and the low timeliness of visual localization;the large viewing scenes in indoor complex scenes contain a large number of distorted and distorted features,which cannot complete the feature alignment between the query image and the scene image in the image matching phase of visual localization,resulting in large visual localization errors.In response to the above problems,the dissertation carries out research on technologies for the efficient acquisition of high-precision visual localization in complex indoor scenes,and achieves the following innovative results:(1)Maps construction method based on image semantic features accurate segmentation in indoor visual localizationThere are a large number of dense areas with objects obscuring each other in complex indoor scenes,which leads to the inability to provide accurate semantic feature information when constructing indoor semantic maps.To address this problem,the dissertation proposes a semantic segmentation model with multi-modal attention mechanism to provide accurate semantic segmentation information for the construction of high-precision semantic maps.The semantic segmentation model based on a multi-modal attention mechanism comprehensively integrates image RGB features and depth features,learns the attention of different features through the proposed image RoI(Region of Interest)enhancement channel attention mechanism,spatial attention mechanism and feature reshaping mechanism,and uses crosscorrelation to fully exploit the advantageous complementarity between different features.It is able to extract semantic feature-enhanced representations rich in spatial location information to reduce the blurring of semantic feature edges between overlapping objects and improve the accuracy of indoor semantic maps.The experimental results show that the proposed semantic segmentation model achieves an average pixel segmentation accuracy of 85.06% and 84.15% on indoor occlusion scenes of public datasets ADE20K and SUN,respectively,which exceeds the performance of relevant mainstream semantic segmentation models such as DeepLab,APCNet and SETR.The high-precision indoor semantic maps constructed in the dissertation in practical application scenarios successfully improve the localization accuracy of visual localization by 12%.(2)Accurate association method for key sparse feature extraction and 3D point cloud in indoor visual localizationIndoor complex scenes with similar structural layouts and textures lead to repeated retrieval and a large number of iterations for matching between images and scenes,which makes high complexity of visual localization and prevents efficient acquisition of precise localization results.To address this problem,the dissertation makes a breakthrough in two stages,on the one hand,a lightweight key sparse feature extraction model is proposed to quickly extract discriminative global depth features,prompting key sparse global descriptors to substantially improve the retrieval efficiency of multidimensional features and overcome the impact of repetitive retrieval.On the other hand,the dissertation proposes an efficient association strategy for the 3D points of a scene and the 2D features of its projection region,which preserves the key region pixels in the image associated with the scene model,overcomes the effect of a large number of iterations in 2D-3D matching and reduces the complexity of visual localization to ensure its accuracy and timeliness in complex scenes.The experimental results show that the proposed efficient visual localization method achieves an average localization accuracy of 0.85 m and 0.91 m on the publicly indoor scenes datasets TUMindoor and InLoc,respectively,and the average localization timings of 1.6s,1.3s and 3.6s are achieved on the two datasets and the fused dataset,respectively.The average localization efficiency is 48% higher than that of related mainstream visual localization algorithms such as 2D-3D,InLoc and HFNET.This algorithm achieves efficient visual localization of indoor complex scenes.(3)Precise matching method for image confidence sparse features in indoor visual localizationThere are a large number of distorted and anamorphic visual features in the indoor large-view scenes,such as the wide baseline long-view distances,perspective changes and weakly textured scenes contained in the large-view scenes leading to increased visual localization errors.Aiming at the problem that accurate image alignment cannot be completed for large-view scenes,an image sparse confidence features matching model based on Transformer structure is proposed to achieve localization failure scene perception and feature matching from coarse to fine through the fusion of semantic information and sparse features.This model learns the specificity and relevance of sparse features by the proposed feature relevance attention learning mechanism for the presence of distorted features in large-view scenes,and overcomes the problem of abnormal attention of some features to enhance the confidence of matching sparse features,which effectively improves the performance of image matching and visual localization in large-view scenes.The performance of the feature matching model and visual localization were tested on the publicly available datasets HPatches and InLoc,respectively.The average 79.8% AUC(Area Under Curve)of the corner error in large view scenes is achieved in the image matching experiments,which outperforms the relevant mainstream image matching models such as D2-Net,Super Glue and LoFTR;in the visual localization experiments,the proposed feature matching model improves the average localization accuracy of the relevant visual localization algorithms such as IBL,HAIL and EfiLoc by 12.6%.
Keywords/Search Tags:Indoor Visual Localization, Maps Construction, Semantic Segmentation, Image Matching, 2D-3D Matching
PDF Full Text Request
Related items