In recent years,with the development of society and technology,the application of robotics in industrial production and social life has become more and more widespread.Accurate recognition of targets in the scene and estimation of grasping attitude are the basis of stable grasping.The continuous progress of deep learning technology promotes the updating of computer vision technology.It is an important research direction to introduce deep learning technology to target detection and grasp pose estimation.This paper focuses on how to improve the accuracy of target detection and grasping pose estimation of common daily objects.This paper carries out a series of works including:Firstly,this paper reaches on the object detection algorithm based on deep learning.In the beginning,the basic structure of convolutional neural network and three classical feature extraction networks are studied.Then,this paper analyses the current mainstream two-stage target detection algorithm and single-stage target detection algorithm based on deep learning.The two-stage target detection algorithm Faster R-CNN and the singlestage target detection algorithm Yolov5 are selected for experimental comparison on the self-built dataset.According to the experimental results,Yolov5 with better real-time performance is selected as the network basis for improvement.Secondly,this paper studies the grasp posture estimation algorithm based on deep learning and compares the advantages and disadvantages of different end-effectors.The representation of the parallel-plate end-effector’s grasping pose in the image space is analyzed.The improved five-dimensional grasping frame with grasping quality parameters is used as the representation method of the grasping pose in this paper.The current mainstream grasp pose estimation algorithms based on deep learning are analyzed,and the single-stage Multi Grasp and the generated GG-CNN grasp pose estimation network are selected for experimental comparison on the Cornell grasp dataset.According to the experimental results,GG-CNN,a full convolutional network with a relatively simple model structure,is selected as the basis of the grasping pose estimation network for improvement.Thirdly,the main network,neck network and detection network of Yolov5 are improved for the problems of missing detection of small targets and inaccurate target positioning in the actual detection.The self-built rotating frame dataset is constructed,and the ablation experiment of the improved Yolov5 is compared with the self-built data set.The average accuracy of the improved Yolov5 reaches 88.92%.Compared with the original horizontal frame positioning,the rotating frame positioning is more accurate in describing the position of the object.Finally,Res Block module,Inception module and spatial channel attention module are introduced to deepen the network to improve the low prediction accuracy of GG-CNN.In order to deepen the network and reduce the amount of computation as much as possible,depth-separable convolution is introduced to replace standard convolution.Since the grasping posture is easily affected by the object contour,the shallow contour features extracted by the feature extraction network are fused with the deep semantic features.Cornell grasping dataset is used to train the improved GG-CNN,and the final average accuracy of the improved GG-CNN on the test set reached 98.8%.Combined with the target position information obtained by the improved Yolov5,the improved GG-CNN can realize the estimation of the grasping posture of the specified target. |