Font Size: a A A

Research On Camera Pose Estimation Method Based On Deep Neural Network

Posted on:2024-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:H J LiFull Text:PDF
GTID:2568307097962949Subject:Electronic information
Abstract/Summary:
Camera pose estimation refers to the calculation of the camera position and orientation in 3D space by image processing and computer vision techniques.It has important applications in fields such as robot navigation,autonomous driving and augmented reality.However,in practical applications,problems such as illumination changes,rotation between images,scene dynamics and texture sparsity can largely affect the accuracy of camera pose estimation.Therefore,in order to improve the accuracy of camera pose estimation,this paper investigates a deep neural networkbased camera pose estimation method and designs robust algorithms for different scenarios and applications using NVIDIA GeForce RTX 2070 GPU as the computing platform.Among them,a camera absolute pose regression method with contextual global self-attention guidance and a scene coordinate regression pose estimation method based on depth feature fusion are proposed.The details are:To improve the robustness of camera pose regressors in dynamic environments,an end-toend camera absolute pose regression network guided by contextual global self-attention is designed in this paper.The algorithmic model takes only a single image as input and performs fine-grained robust geometric feature extraction using a global contextual self-attention module to reduce the influence of dynamic objects and illumination conditions on feature extraction and improve the robustness of pose estimation.After the feature extraction,the extracted features are aggregated with deep feature channels using replacement attention to further improve the robustness of the model and achieve more accurate pose estimation.In addition,to construct an absolute camera bit-pose regression model,a multilayer perceptron and a Euclidean distance loss function are used.By directly predicting the pose vector,end-to-end camera pose estimation is achieved.To verify the effectiveness of the method,this paper is evaluated on common publicly available indoor and outdoor datasets.The experimental results show that the method not only ensures the localization accuracy but also has stronger robustness,especially on the highly variable outdoor datasets.Extensive ablation evaluations are also conducted to demonstrate the effectiveness of contextual global self-attentive residual blocks in improving model robustness,providing new solutions for practical applications in areas such as robot navigation,autonomous driving,and augmented reality.To address the problem that the presence of sparse or repeated textures in indoor scenes makes it difficult to extract effective image features,this paper proposes a scene coordinate regression network based on attention depth feature fusion.The network predicts the scene coordinates corresponding to pixel points by inputting a single RGB image,and performs bitpose estimation using a reprojection error fine tuning model.Unlike previous methods of 3D scene coordinate regression using only high-level features,this paper argues that a robust feature representation containing rich spatial details and structural information can help improve camera localization performance.Therefore,a depth feature fusion module is designed in the network,which can fuse multi-level contextual information and make full use of the rich spatial details in the low-level feature maps to further improve the recognition of image repetition or low-texture surface features and improve the camera localization performance.In this paper,the proposed method is experimentally validated using the 7Scenes dataset,and the results show that the model can make full use of the image information to directly learn the current scene 2D-3D matching relationship and effectively improve the camera pose estimation accuracy.
Keywords/Search Tags:Camera pose estimation, attention mechanism, deep feature fusion, scene coordinate regression, deep neural networks
Related items