Font Size: a A A

Research On Visual Object Detection,Pose Estimation,and Tactile Localization For Robot Grasping

Posted on:2024-06-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:1528307340975239Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The stable grasping of objects by robots through mechanical arms is a fundamental functionality for achieving autonomous operation and intelligent interaction.The current mainstream approach for achieving explainability involves modeling the objects and the grasping process through visual and tactile feedback.This approach allows for obtaining the position,orientation,and contact feedback during the grasping process.Within this approach,the grasping process is divided into multiple technical stages.Identifying and localizing objects from a single image,estimating the six-dimensional pose of objects,and obtaining precise contact positions during the grasping process are the three most important technical stages in this approach.Challenges still exist in object recognition and high-precision localization,particularly in scenarios with dense objects of the same category.In object pose estimation,issues such as the high cost and slow processing speed of point cloud models,as well as weak intra-class generalization,are present.Moreover,during the final execution of object grasping,accurately estimating the contact position between the robot’s end-effector and the object poses difficulties,leading to the inability to determine the object’s contact state.To explore and address these aforementioned challenges,this paper conducted in-depth research,we focuses on the issues of object localization accuracy and feature confusion in the grasping process of a mechanical arm based on RGB images.To address these issues,a gradient corner pooling encoding method is proposed,which is specifically designed for densely homogeneous objects.Furthermore,this study establishes a sketch model knowledge base for object geometric components and a high-speed six-dimensional pose estimation method for objects.Additionally,to overcome the challenge of accurately estimating the contact positions during the object grasping process using visual information,the article introduces the use of tactile modalities for high-precision super-resolution modeling of the contact positions between the object and the robot end-effector.The main research work and innovations of this study can be summarized into the following three parts:1.Object localization method based on spatial position embedding pooling.Object recognition and localization are core tasks in object detection.Detecting objects as multiple keypoints is a class of object detection methods with high localization accuracy.Further improvement in localization accuracy can be achieved by using corner pooling to locate the keypoints of object bounding boxes.Corner pooling accurately locates the corner points of the object bounding box by performing maximum pooling in the x and y directions of the feature map and summing them.However,in one-dimensional maximum pooling operations,features of densely arranged objects of the same category can easily be confused and occluded.To address this issue,we propose a gradient corner pooling encoding method that encodes the spatial distance information between homogeneous object features on the feature map,effectively solving the occlusion problem of feature of the same category objects.Gradient corner pooling encoding enables fast computation through block-wise comparisons,with the same computational complexity as traditional corner pooling.Gradient corner pooling provides continuous improvement for various keypoint-based methods.By replacing the corresponding modules in the baseline pipeline with the proposed method,a significant average precision improvement of over 6.5% was achieved on dense objects in the MS-COCO dataset.Additionally,object detectors utilizing gradient pooling encoding showed better adaptability to object angles in real-world scenario tests.2.Constructing object geometric component sketch models and a fast 6D pose estimation method.In most instance-level or category-level 6D pose estimation methods,accurate computer-aided design models or point cloud models are essential.However,acquiring 3D models of everyday objects is challenging,and methods that rely on precise models often exhibit weak generalization to different instances of the same object category.To address these challenges,this paper proposes an object geometric component sketch knowledge base,which includes 270 simplified graphical models of real-world objects from 30 categories.We decompose objects into geometric components based on their functionality and structure,transformed into three fundamental spatial structures: frustum,circular truncated cone and sphere.This paper introduces a fast sketch-based modeling tool and workflow.The average time to create sketch models of everyday objects using this method is approximately 2 minutes.Furthermore,by leveraging the component models for multi-view projection,a fast 6D pose inference framework is developed based on the geometric information and spatial relationships of the components in both 6D space and projection space.This method utilizes geometric constraints from component projections to retrieve and score viable solutions in a discrete 6D pose space,gradually constraining the solution space and ensuring interpretability during the solving process.Extensive experiments conducted in real-world environments demonstrate reliable and robust 6D pose estimation of objects,even without precise computer-aided design models or point cloud models.Additionally,this method can leverage the parallel computing capabilities of GPUs,achieving a processing speed of 90 frames per second and attaining state-of-the-art performance.3.Object contact position super-resolution modeling based on spatio-temporal continuity learning.In order to obtain accurate tactile feedback during grasping processes,this paper proposes a method to enhance the super-resolution of robot tactile position by learning the spatio-temporal continuity of contact positions and the tactile sensors composed of overlapping air chambers.Each overlapping air chamber is constructed from soft materials and sealed with internal pressure sensors to mimic the adaptive receptors of human skin.Each pressure sensor obtains the global receptive field of the contact surface through pressure conduction in the highly elastic sealed overlapping air cavity.Causal convolution is employed to analyze multiple pressure data and predict the contact position.The spatio-temporal continuity of the contact position contributes to the precision and stability of localization.Using only four physical sensing nodes on a rubber surface(with an average of 0.1 millimeters on a 38 × 26 square millimeter area),we achieved a super-resolution(SR)factor exceeding 2500,which is currently the leading performance.This paper also quantitatively analyzes the influence of causal convolution’s time series length on the accuracy of position prediction.Based on this sensor and modeling method,this paper demonstrates that robots can achieve challenging tasks such as tactile trajectory tracking,adaptive grasping,and human-robot interaction through tactile sensors.This study systematically integrates the key algorithms mentioned above with tactile sensors,achieving fast and accurate implementation of various applications in scenarios such as robot grasping.It validates the effectiveness of the algorithms in practical situations and showcases promising application prospects.
Keywords/Search Tags:Robot Grasp, Object Detection, Object 6D Pose Estimation, Knowledge Base, Tactile Localization Super-Resolution
PDF Full Text Request
Related items