6D pose estimation means locating and estimating the position and pose of a given target in 3D space.In recent studies,many scholars have used RGB-D as an information source for open scenes in order to combine point cloud and image data for subsequent studies.The approach of fusing point cloud features with image features can improve the diversity of data and the completeness of information,thus enhancing the effectiveness of 6D pose estimation tasks.However,point cloud and image data are different modal data and have large differences.This difference often leads to an imbalance in the contribution of the two data features in the fusion decision process.Therefore,how to effectively cope with the contribution imbalance in the fusion of RGB image features and 3D point cloud features is one of the main challenges for the current 6D pose estimation task.In terms of feature extraction,most existing methods use non-graph structures or static graph structures.Their drawback is that the downsampling process tends to ignore some effective features due to point sparsification.In terms of fusion,most methods stitch point clouds and image features directly and simply for fusion,and adopt a fusion strategy that treats point clouds and image features equally.However,existing methods often neglect the screening of complementary features among heterogeneous features,and the linear superposition fusion approach tends to lead to the averaging of asymmetric feature effects,which is not conducive to the effective differentiation of the contribution degree of each heterogeneous information feature in6 D pose estimation.This topic addresses the difficulties of feature extraction and feature fusion in point cloud and image fusion processing technology,and proposes the following new explorations:(1)using dynamic update feature embedding network to extract point cloud features to solve the problem of the influence of dimensional change on local features of point cloud;(2)using adaptive feature extraction technology to solve the problem of difficulty in quantifying the complementarity between point cloud and image features;(3)using normalized exponential weighting fusion idea to solve the problem that the unequal contributions of point cloud features and image features are difficult to be reflected.Experiments are conducted on two benchmark RGB-D datasets,Line MOD and YCB-Video,and are fully compared with other recent 6D pose estimation algorithms.The results show that the proposed new method performs well in terms of 6D pose estimation performance for multiple classes of targets.On the Line MOD dataset,an average accuracy of 97.5% is achieved,which is 2.4% better than PVN3 D.On the YCB-Video dataset,the average accuracy achieves 97.6%,which is 1% better than FFB6 D.This demonstrates that the developed algorithm can effectively support the application scenario of 6D pose estimation. |