Font Size: a A A

Learning Robot Manipulation Commands From Long Demonstration Videos

Posted on:2022-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z M ZhuFull Text:PDF
GTID:2518306539969339Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most of the existing robots complete specific tasks according to preset programs or instructions,which cannot meet people's personalization and customization needs,thus limiting the application and development of robots.Video command learning is an important method to empower robots.Through video commands learning,robots can understand human behavior intentions and learn skills independently without cumbersome pre-programming steps.This paper studies the commands learning for untrimmed videos and proposes a robot manipulation commands learning framework based on the action segmentation network.The demonstration video usually contains a series of manipulation actions,and the start and end moments of these actions are unknown.In response to this problem,this paper proposes a video action segmentation framework based on a multi-stage atrous pyramid network.The network mainly uses the atrous convolution pyramid module to capture multiscale action features and uses a multi-stage architecture to refine the segmentation results,thereby predicting the action class of each frame in the video,and segmenting the untrimmed video into a series of perceptible and analytical video clips.Based on the action segmentation framework,this paper proposes a demonstration videooriented robot commands learning framework,which can learn robot command sequences from untrimmed demonstration videos.The framework contains three main modules: action segmentation module,object recognition module,and commands generation module.The action segmentation module is used to segment the video into a series of video clips.In the object classification module,the object detection model is applied to extract object features,then the object features and action features are merged,and the classifier is utilized to identify the participating objects.In the commands generation module,actions and objects are combined to generate commands that can be understood and executed by the robot.Experiments on the MPII Cooking 2 dataset show that the multi-stage atrous pyramid network has achieved a certain performance improvement in various metrics of the action segmentation task.The proposed commands learning method can generate the robot command sequences from untrimmed videos with high accuracy.Finally,we successfully deployed our system on a Baxter robot to further verifying the effectiveness of our framework.
Keywords/Search Tags:video commands learning, robot commands generation, action segmentation, atrous convolution
PDF Full Text Request
Related items