Font Size: a A A

The Research On Object 6D Pose Estimation Method Based On RGB Image For Open Scenes

Posted on:2022-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:S B ZhangFull Text:PDF
GTID:1488306734989339Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The object 6D pose estimation technology is one of the core issues in the field of artificial intelligence 3D perception.It aims to help machines to better perceive the 3D information of the world.It is the key part in robot grasp,autonomous driving,and augmented reality applications.It gets vast attention from the application developers and academic researchers.Most of the current methods utilize deep convolutional neural networks to find the latent relationship between the image and the 6D pose of the object.However,these methods require a huge amount of manually labeled data for training.Not only the labeling work is time-consuming and needs complicated setup,but also the network is easy to fall into overfitting to a single scene,resulting in performance degradation in open environments,and poor results in practical applications.In addition,due to the difference between experiment environments and real scenes,it's hard to evaluate the effectiveness of object 6D pose estimation algorithms in practical applications.This dissertation proposes to solve the object 6D pose estimation in open and complex scenes based on RGB images.We focus on training the network with cheap annotated data,and mining the essential features of the object pose that do not depend on the environment to solve the above problems.This dissertation divides the easy-to-obtain annotation data in real situations into two scenarios.One is to automatically generate a large number of synthetic images using the 3D model of the object,and the other is to use the relative pose transformation between the shooting cameras as the annotation information.This dissertation proposes the 6D pose estimation methods in these two data environments separately.To make the model generalized to cross-domain environments,a graph convolutional network is proposed to learn the essential common features that exists in both real and synthetic images,thereby optimizing the pose estimation model.To verify the practical effect of proposed object 6D pose estimation algorithms,an augmented reality system is designed to achieve the fusion of virtual contents and reality.The main researches of this dissertation focus on following works:In this dissertation,an object 6D pose estimation method trained synthetic data is proposed.Using computer to render the 3D model of the object in different backgrounds to generate a large number of synthetic images with labeled pose annotations to avoid complicated manual annotations.Diversified image generation pipeline and a network pretraining strategy are used to reduce the negative impact of domain shifts on pose estimation performance.And we design an end-to-end network architecture to achieve keypoint detection,pose estimation,and pose refinement in a fully differentiable way,reducing the redundant feature extraction process and improving the estimation speed.In this way,the pose information can be used directly as labels to optimize the network in an end-to-end way.This dissertation proposes an object 6D pose estimation method using relative pose transformation as supervision.We develop a keypoint-based 6D object pose detection framework.The framework uses the paired images and the relative transformation between their viewpoints as training data and use a multi-task loss function to train the network to automatically find 3D keypoints with visual and geometric consistency on objects.The 6D pose of the object can be calculated by a method of inference based on keypoints.The relative transformation between viewpoints can be obtained from a camera or a smart phone,thus greatly reducing the workload of labeling.In this dissertation,we propose a novel method that uses graph convolution networks optimizing object 6D pose estimation network.The method calculates the pose of the object by detecting keypoints on the object,and employs a graph convolution network to capture the geometric structure among the keypoints.This geometric structure is a shared and domain-invariant feature between real images and synthetic images.By learning these features on synthetic images,and guiding the network on training on real images,the network's cross-domain capability in open scenes is improved,and the dependence on artificially labeled data is reduced.To show the proposed object 6D pose estimation algorithms can be applied in practice,this dissertation designs an augmented reality system to verify the practicality and stability of the proposed algorithms.Under the circumstance of rendering virtual contents on cultural relics,the proposed algorithms and training strategy are proved to be effective.
Keywords/Search Tags:computer vision, object 6D pose estimation, keypoint detection, graph convolutional network, transfer learning, augmented reality
PDF Full Text Request
Related items