Font Size: a A A

Visual Learning Based Method For Robot Learning Skills From Videos

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J H ChenFull Text:PDF
GTID:2518306470463124Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual learning is an important way for robots to learn human skills,and is also one of the effective methods to improve the robot's intelligence level.Compared with the traditional preprogrammed mode,visual learning can tremendously eliminate the time required for manual programming and accelerate the deployment efficiency in new scenarios.Moreover,the skills learnt from videos are more robust,which can be executed in more complex scenarios and improve the operability of the tasks.In thisthesis,we propose a visual learning based method for robot learning skills from videos.Different from other visual learning methods,the method we proposed can learn corresponding robotic commands from complex action videos.More specifically,the method mainly contains three modules,namely,the action segmentation module,the object recognition module,and the linguistic based command generation module.Firtst of all,the first module uses the Two-Stream Inflated 3D Conv Net(I3D)to extract the action feature,and then adopts the Encoder-Decoder Temporal Convolutional Network(ED-TCN)to further extract action features and segment video into video clips as well.Noted that each video clip contains a single action plan.In the second module,we adopt Mask R-CNN to extract the features of the object,and then fuse the object features and action features and use Catboost to classify the subject objects and patient objects.In the last module,we establish the linguistic descriptions to generate the summative commands according to the recognized objects and actions as well as the operating hand in the real world environment.Finally we translate the summative commands into robot instructions for robot execution.In order to verify the effectiveness of our proposed method,we conduct our experiment on a public dataset called MPII Cooking Activities Dataset.The experimental results show that the method we proposed can effectively convert the video into summative commands and successfully run on the humanoid robot Baxter.
Keywords/Search Tags:Visual learning, ED-TCN, Mask R-CNN, Robot
PDF Full Text Request
Related items