Font Size: a A A

Research On Ground Time-sensitive Target Pose Estimation Based On Monocular Vision

Posted on:2023-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2558307169478774Subject:Engineering
Abstract/Summary:PDF Full Text Request
The target pose is a general term for the position and attitude of the target.Target pose estimation refers to the process of solving the translation and rotation transformation between the target body coordinate system and the reference coordinate system.In the field of electro-optical reconnaissance and precision guidance,the pose information of the target is a very important parameter,which reflects the current state of the target.Meanwhile,the pose information of the target is of great significance for characterizing the target state,analyzing the battlefield situation,and selecting key parts of the target to carry out precise strikes.This paper mainly studies the problem of estimating the pose parameters of ground targets from images obtained by low-altitude platforms,focusing on time-sensitive targets on the ground such as tanks and armored vehicles,using deep learning as a theoretical tool to automatically detect objects and estimate their pose parameters from input monocular images.This paper mainly carries out related research work in the following four aspects:Firstly,Considering that the use of deep learning methods to carry out target pose estimation research has the problem of lack of data samples,this paper introduces a simulation method for generating different pose data sets of targets and performing batch labeling,which significantly reduces the time for manual labeling cost.Secondly,based on the target image data with different poses,this paper uses the data dimensionality reduction method to reduce and visualize the target images with different poses as well as reveal their manifold characteristics in three-dimensional space,and use shallow artificial neural networks to learn the pose information of target.Thirdly,a deep feature-driven pose parameter regression network is constructed.Considering that the translation and rotation in the pose parameters have different effects on the target in the image,this paper adds a dual attention module(Convolutional Block Attention Module,CBAM)that fuses spatial and channel features to the network and a deformable convolution(Deformable Convolutional Network,DCN)to enhance the network’s ability to represent changes in target position and attitude.The experimental results show that,compared with the original network,after introducing the CBAM attention module and DCN,the error of the network regression pose parameters is significantly reduced.Fourthly,the estimation method of target pose based on the corresponding key points is studied in this paper.The method mainly consists of two parts: key points detection and pose estimation.Considering that when the local area features of the target key points are less distinguishable from the background or the target pose changes so that some key points of the target are occluded.The above situation will directly increase the difficulty of the network regressing key points,which will ultimately affect the subsequent target pose estimation results.To solve this problem,this paper replaces the conventional convolution with dilation convolution in the feature fusion part of the network.By expanding the receptive field of the network,the global information of the target can be better used to infer the local information of the target key points.the experimental results show that the method proposed in this paper can effectively improve the accuracy of key points detection and reduce the error of pose estimation.
Keywords/Search Tags:Pose Estimation, Attention Mechanism, Deformable Convolution, Key Point Detection, Dilation Convolution, Deep Learning
PDF Full Text Request
Related items