Font Size: a A A

Multi-Scale Convolutional Feature Fusion For 6D Pose Estimation

Posted on:2024-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y RenFull Text:PDF
GTID:2568307184455524Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
6D pose estimation refers to the computation of the six degrees of freedom(DOF)pose of a rigid object,that is,the identification of the three-dimensional translation and threedimensional rotation of the object in the image in the standard reference frame.The estimation of the 6D pose of an object plays an important role for computers in the areas of augmented reality,autonomous driving,and 3D reconstruction.The development of deep learning makes the method of 6D pose estimation achieves better estimation results.In the methods based on RGB image,the two-stage methods corresponding to the two-dimensional image point and the three-dimensional model point are deeply studied by virtue of easy use and high accuracy.However,in practical application scenarios,there are usually noise factors,such as background clutter,mutual occlusion of objects,and illumination changes,which pose great challenges to accurately estimate 6D pose.Aiming at these problems,in order to improve the performance,accuracy,and stability of 6D pose estimation in complex scenes in real time,in this thesis,we investigate the 6D pose estimation methods based on the corresponding points.The main research work is as follows:Aiming at the problem of poor estimation accuracy of target objects under background clutter and illumination changes,this thesis proposes a 6D pose estimation method based on multi-scale convolution feature fusion.The encoder-decoder structure network in the semantic segmentation step uses a lightweight multiscale convolutional fusion module to replace the original convolutional layer,in order for the network to contain multiscale information and effectively enhance the network’s ability to understand features.At the same time,a convolutional layer chain with residual learning is added in the skip connection stage to eliminate the semantic differences between different layers of networks,improving the segmentation performance of the network,and subsequently improving the pose estimation accuracy.Finally,the training test is performed on the public LINEMOD dataset.The experimental results show that,compared to the original method,this method improves the2 D projection metric by 2.6% and the ADD(-S)by 1%,which fully proves the effectiveness of this experiment in the 6D pose estimation task.Aiming at the problem that the stability of the 6D pose estimation method in the occlusion scene is poor and it is difficult to achieve accurate estimation,based on the previous research on the 6D pose estimation method,a 6D pose estimation method for occluded objects with dual attention mechanism is proposed.By adding a double convolution attention module(CBAM)in the network,the network generates attention feature map information from two dimensions of channel and space,and performs adaptive feature correction with the original feature map,which effectively enhances the spatial information and location information of the feature map and improves the network learning ability.This experiment is trained on the LINEMOD dataset,and the 2D projection metric and ADD(-S)index are increased by 1.2%and 2.6% respectively compared with the original network.Experiments are performed on the Occlusion LINEMOD dataset,and the 2D projection metric and ADD(-S)index are increased by 0.4% and 0.9% respectively compared with the original network.At the same time,the ablation experiment proves that the combination of multi-scale convolution feature fusion module and dual attention mechanism can effectively improve the accuracy of pose estimation,which fully verifies the stability and accuracy of the experimental method for occlusion scenarios.
Keywords/Search Tags:6D pose estimation, Semantic segmentation, Multiscale convolution feature fusion, Dual attention mechanism
PDF Full Text Request
Related items