Font Size: a A A

Category-level 3D Object Tracking Model Based On Inter Frame Correspondence

Posted on:2024-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:X CaoFull Text:PDF
GTID:2558306920951129Subject:Software engineering
Abstract/Summary:
With the continuous development of computer vision,the pose estimation task has gradually entered various fields of production and life,and plays a key role in many applications that require interaction with the real world.According to the different targets,backgrounds and conditions of pose estimation,various tasks have emerged,such as bounding box estimation,object detection,object tracking,human pose estimation,and so on.It has deepened the fields of human-computer interaction,augmented reality,automatic driving and robot manipulation.In recent years,with the maturity and development of instance-level tasks and deep learning,category-level pose estimation tasks have also gradually developed.Instance-level object pose estimation tasks such as instance-level object detection and object tracking have achieved very accurate results.However,obtaining the 3D model of the target object in practical application scenarios is often costly,which makes it difficult for instance-level pose estimation algorithms to generalize and apply to new objects.Therefore,category-level pose estimation tasks have emerged.Like instance-level pose estimation tasks,the goal of category-level pose estimation tasks is to process input data and estimate the pose of the target object,but the difference is that category-level tasks do not need to obtain the target object model in advance to perform pose estimation on a certain category of objects,and can handle objects that have not been seen before.Currently,the mainstream idea for category-level single-frame pose estimation and sequence 3D tracking tasks is to construct point cloud correspondences in different coordinate spaces and solve for relative pose.Building point cloud correspondences mainly focuses on methods such as directly regressing object pose and keypoint matching.To address the problem of registration errors caused by noisy and outlier points,a data processing and denoising module is proposed,and to address the issue of poor "local-global" matching performance in existing category-level pose estimation,a model based on inter-frame correspondence is proposed.For error accumulation problem caused by sequence tracking,an pose optimization module based on normalized coordinates is proposed.The main research content is as follows:1.In the data processing and denoising module,due to the influence of factors such as sensor accuracy and scene changes,the collected point cloud inevitably contains noise,which can seriously interfere with the accuracy and effectiveness of point cloud registration and feature extraction.To address this issue,we designed a data processing and denoising module that uses a tight bounding box defined in a normalized object coordinate space and a 3D graph convolutional network to process noise in the observed point cloud.The module filters out and removes noise points and depth acquisition errors that do not belong to the object,significantly improving the accuracy of subsequent point cloud registration.2.A module for inter-frame pose estimation is proposed for estimating the relative pose between two different frames.This module can extract the locally invariant geometric information of each point from the observed point cloud and use a parameter estimation network to generate dynamically adaptive parameters for the data.The corresponding matrix is generated to estimate the correspondence relationship of the point pairs,and the Umeyama algorithm is used to calculate the pose transformation of the two sets of point clouds after obtaining the corresponding relationship.By predicting the pose transformation between adjacent frames,the pose estimation of each frame of point cloud in the sequence data can be achieved.3.We also proposed a pose optimization module,which uses the reference frame as a reference and the estimation from the previous frame as the reference pose.By combining the frame-to-frame pose estimation results between the previous and current frames,the point clouds of the reference and current frames are transformed to eliminate the accumulated errors caused by continuous frame-to-frame matching.This allows the error distribution accumulated from multiple frames to be calibrated to the error distribution between adjacent frames using the reference and current frames,thereby improving the accuracy of the single-frame pose estimation.
Keywords/Search Tags:Pose Estimation, Category-level, 3D Tracking, Correspondence Estimation
Related items