Font Size: a A A

A Pose Estimation Method Based On Pose Correction And Improved Positional Encoding

Posted on:2024-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiangFull Text:PDF
GTID:2568307157482344Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual odometer refers to method that uses camera to calculate the trajectory of objects,also known as visual positioning or pose estimation.The unsupervised pose estimation model has two shortcomings,one is that the error of pose estimation makes the value of the loss function too large,the other is that the convolutional neural network cannot notice the time information contained in the channel.In order to correct the error in pose estimaion and make the convolutional neural network notice the time information contained in the channel,this work studies an unsupervised pose estimation model based on pose correction and improved positional encoding.and contributions of this work are as follows:Firstly,an unsupervised pose correction model is proposed,which includes a depth estimation network,a pose estimation network,and a pose correction network.The pose estimation network is same as the pose correction network.Due to the existing unsupervised pose estimation models mainly adopt view synthesis to construct a loss function,and there are certain differences in the size of the object between the synthesized view and the target view,this difference can instruct neural network to correct the pose estimated.The pose correction network calculates the value of pose correction through the target view and the synthesized views,and the synthesized views are sampled from the reference views based on the coordinate generated from the depth of the target view and the inter frame pose from the target view to the reference views.The pose corrected is equal to the sum of the pose estimated and the pose correction value.Then,an improved positional encoding mechanism is proposed,which includes three dimensions: time and two-dimensional space,and it is a two-dimensional positional encoding.The improved positional encoding can be obtained by matrix multiplication from one-dimensional positional encoding,and still has the property that subsequent positional encoding can be derived from previous positional encoding.For ordinary convolutional neural networks,if temporally adjacent images are concatenated along the channel dimension and input into the network,the convolutional neural network will treat each channel equally,thereby ignoring the information in the time dimension.The purpose of introducing this positional encoding is to add prior information to emphasize the differences in the channel dimension,thereby enabling the convolutional neural network to focus on the information in the time dimension.Finally,a large number of ablation and comparative experiments have been conducted on the proposed pose correction model and improved positional encoding.Experimental results on the KITTI and Eu Ro C MAV datasets show that the proposed pose correction model and improved positional encoding can effectively reduce the positioning error of the pose estimation model,thereby proving the effectiveness of the proposed pose correction model and positional encoding.
Keywords/Search Tags:Pose correction, Two-dimensional positional encoding, Unsupervised learning, Pose estimation method, Convolutional neural network
PDF Full Text Request
Related items