The robot arm is the most used mechanical device in the robot industry.It can imitate the human arm to perform some operations.Traditional robotic arms learn the execution process of tasks by using the "teaching-reproduction" method.The robotic arm can only obtain the position and pose of each joint of the robotic arm,and it is difficult to reproduce tasks in randomly placed scenes.The purpose of the semantic parsing method of the robot arm task execution studied in this paper is to generate the semantic parsing graph of the task execution,including the action sequence and the object acted by the action,so that the robot arm can understand the high-level semantics of the task execution in the process of task learning.Considering that it is difficult to collect data in real scenes,and data in simulation space is characterized by accuracy,richness,and accessibility,this paper chooses to study semantic parsing methods of robotic arm task execution in simulation space,mainly from the following two aspects:(1)The robot arm task data set was constructed,and the relevant data of the robot arm task execution process were collected from the simulation space,including the category and number of the object,the three-dimensional position and size of the object,the threedimensional direction of the object,the opening-closing information of the manipulator gripper,image sequence,and the two-dimensional position of the object in the image and other state data.To quickly and a lot of mechanical arm task execution semantic parsing results indicate,this paper proposes a mechanical arm task execution semantic parsing method,the method through the analysis of the movement of manipulator gripper and other objects,parse and division of tasks,get the task execution action sequence and action role object,the parsing result can be used as the real label of the task parsing result after manual correction,which saves a lot of time compared with manual annotation.(2)To be closer to the actual application,this paper proposes a semantic analysis method for manipulator task execution based on an image sequence.Firstly,the YOLOv5-Deep SORT model is used to detect and track multiple objects in the image sequences collected in the simulation space,and the trajectory of the manipulator’s griper and other objects is obtained,including the trajectory number,category,position,and size.Then a classification network model based on CNN is used to recognize the opening-closing state sequence of the manipulator gripper from the image sequence.Finally,the above detection results are combined into the sequence segmentation and action recognition model based on LSTM,and the action sequences and action objects are obtained.Aiming at the problem of scale change caused by random placement and movement of objects in task scenes,this paper improved YOLOv5:1)adding an attention mechanism,and 2)adding a detection head to the prediction layer.The experimental results show that the performance of the improved model on the test set is improved.In this paper,the classification network model based on CNN and the sequence segmentation and action recognition model based on LSTM both perform well on the test set under the condition that the input data is accurate.The robot arm task data set constructed in the simulation space can be used in the task parsing,target detection,target tracking,and action recognition fields.The semantic parsing method of task execution proposed in this paper enables the manipulator to have the ability of autonomous analysis in the process of task learning and to understand the high-level semantics of task execution.In addition,the research of the YOLOv5 target detection algorithm in this paper can also be applied to visual servo control.During task execution,the target detector can be used to update the object information in the field of vision and to schedule hardware devices such as machinery to complete corresponding operations. |