Font Size: a A A

Research On 6d Pose Estimation Method Based On Multimodal Input And Attention Mechanism

Posted on:2024-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:W R ZhangFull Text:PDF
GTID:2568307157950129Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
6D pose estimation is a key work in computer vision-based robotic manipulation grasping tasks.6D pose estimation is a challenging task due to diverse lighting conditions,cluttered backgrounds,shadows and occlusions in real scenes.Compared with traditional algorithms that are time-consuming and costly,the 6D pose estimation method based on deep learning saves costs while improving the accuracy and speed of object pose estimation.Therefore,in recent years,the use of deep neural networks has become the mainstream method in the field of 6D pose estimation.This thesis conducts corresponding research on the 6D pose estimation method based on deep learning,creates a synthetic dataset that simulates real scenes and proposed a data augmentation method for extended datasets,and designs two different pose estimation algorithms.The difference between the two methods lies in the input data required for pose estimation network training to meet practical production needs.The first method is suitable for scenes using depth cameras,where the input is a 3D point cloud model of the object,multiple RGB images,and their corresponding depth images;The second method is for scenes that only use RGB cameras,where the input is multiple RGB images and multiple intermediate representations that describe object pose information.The first method fully utilizes the texture information in the RGB image and the geometric information in the depth image to estimate the 6D pose of the object.This thesis propose a full-flow bidirectional fusion network that combines object texture and geometric information for learning.In this way,two networks can utilize complementary information from the other network for better learning.In the output stage,considering the texture and geometric information of the object,a simple and effective 3D keypoint selection algorithm is designed in this thesis,which simplifies keypoint positioning and achieves accurate pose estimation.Experimental results show that our method outperforms the state-of-the-art methods on several benchmarks.The second method lacks the description of the geometric information of the object because there is no input of the depth image and point cloud model.Therefore,this thesis proposes a 6D pose estimation method for single-view objects that combines multimodal input and channel attention mechanism.The method consists of two modules: a prediction module and a pose regression module.The prediction module uses Res Net-18 as the backbone network.This module adds a channel attention mechanism to estimate various intermediate representations of the 6D pose of the object,including key points,edge vectors between key points,and symmetrical correspondence between pixels;The pose regression module uses the EPn P algorithm and singular value decomposition to return the 6D pose of the object from the intermediate representation result.In the experiment,the performance of the method was evaluated on two benchmarks.The results show that the method in this thesis can accurately estimate the 6D pose of the object,and it is better than the existing methods of the same type under the ADD(-S)index,especially has good robustness in the occlusion environment.
Keywords/Search Tags:6D pose estimation, Deep learning, 3D point cloud model, Multimodal input, Attention mechanism
PDF Full Text Request
Related items