Font Size: a A A

Mask R-CNN-based Target Recognition And Pose Acquisition Research

Posted on:2024-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y HanFull Text:PDF
GTID:2568307076491154Subject:Engineering
Abstract/Summary:PDF Full Text Request
In complex multi-objective scenarios,accurate recognition and localisation of target objects in the input image is a prerequisite for autonomous intelligent robot grasping.However,most current6D(6-Dimension)pose estimation methods based on deep learning techniques are only capable of performing a single pose estimation task and are not very accurate in estimating the pose of objects with complex backgrounds,weak textures and in the presence of occlusions.If two unrelated convolutional neural networks are used to perform the target recognition and pose estimation tasks separately,the real-time performance of the algorithm will be very limited.In this paper,a two-stage object pose estimation algorithm for target recognition and pose acquisition is designed to meet the needs of autonomous robots performing grasping tasks in complex environments,so that not only can the target recognition and pose estimation tasks be performed simultaneously,but also the target recognition results can be used to reduce the interference of irrelevant objects and complex backgrounds.The algorithm combines techniques such as instance segmentation,coordinate regression,and Pn P(Perspective-n-Point)to accurately identify objects and perform pose estimation,with two main areas of work:(1)An instance segmentation method based on an improved Mask R-CNN is designed.A modified Mask R-CNN instance segmentation algorithm is constructed to address the problem of possible gradient disappearance or explosion in the deeper layers of the network during the training process of the neural network and the still defective image information loss in the upper convolutional layer of the FPN(Feature Pyramid Networks).Firstly,the original feature extraction network is improved by replacing the original Res Net-50 with Res Ne Xt-50,and secondly,the convolutional kernel in the FPN network is extended to improve the perceptual range of the convolutional kernel.In addition,the Line MOD dataset has been converted to VOC format so that it can be read by the model.Finally,after training the model with the Line MOD and Pascal VOC datasets,the performance of the improved model was validated on a test set and a comparison experiment was conducted on the average accuracy metric.The experimental results show that the improved algorithm is more accurate than the original Mask R-CNN algorithm for bounding box detection and mask segmentation of target objects.(2)A dual-strategy 6D object pose estimation method based on coordinate regression is designed.To address the problem of low accuracy of 6D pose estimation for weak textures,complex backgrounds and occluded objects in current 6D pose estimation algorithms,two different strategies are used to predict the rotation and translation matrices of the pose,respectively.For translation,the relative offset of the local image predicted by the translation estimation network is combined with the predicted target bounding box information of the original image to obtain the translation matrix by direct regression.For rotation,the 2D-3D point pair correspondence of each 2D pixel in the local image with 3D coordinates is predicted by the rotation estimation network,and then the 2D-3D point pair corresponding to the target pixel is extracted using the mask,and the rotation matrix is subsequently solved by the Pn P algorithm.In order to eliminate the influence from the pixels in the background region,a mask loss function that helps the network to predict more accurate coordinates is also designed.Finally,this model is trained on the Line MOD dataset and the prediction is tested on the test set.The comparative experimental results show that the bit-pose estimation network model developed in this paper is more accurate and reliable for localization of target objects.
Keywords/Search Tags:instance segmentation, pose estimation, target recognition, mask loss, feature extraction networks
PDF Full Text Request
Related items