Visual Learning Based Method For Robot Learning Skills From Videos

Posted on:2021-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:J H Chen

Full Text:PDF

GTID:2518306470463124

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Visual learning is an important way for robots to learn human skills,and is also one of the effective methods to improve the robot's intelligence level.Compared with the traditional preprogrammed mode,visual learning can tremendously eliminate the time required for manual programming and accelerate the deployment efficiency in new scenarios.Moreover,the skills learnt from videos are more robust,which can be executed in more complex scenarios and improve the operability of the tasks.In thisthesis,we propose a visual learning based method for robot learning skills from videos.Different from other visual learning methods,the method we proposed can learn corresponding robotic commands from complex action videos.More specifically,the method mainly contains three modules,namely,the action segmentation module,the object recognition module,and the linguistic based command generation module.Firtst of all,the first module uses the Two-Stream Inflated 3D Conv Net(I3D)to extract the action feature,and then adopts the Encoder-Decoder Temporal Convolutional Network(ED-TCN)to further extract action features and segment video into video clips as well.Noted that each video clip contains a single action plan.In the second module,we adopt Mask R-CNN to extract the features of the object,and then fuse the object features and action features and use Catboost to classify the subject objects and patient objects.In the last module,we establish the linguistic descriptions to generate the summative commands according to the recognized objects and actions as well as the operating hand in the real world environment.Finally we translate the summative commands into robot instructions for robot execution.In order to verify the effectiveness of our proposed method,we conduct our experiment on a public dataset called MPII Cooking Activities Dataset.The experimental results show that the method we proposed can effectively convert the video into summative commands and successfully run on the humanoid robot Baxter.

Keywords/Search Tags:

Visual learning, ED-TCN, Mask R-CNN, Robot

PDF Full Text Request

Related items

1	Design Of Mask Automatic Packaging System Based On Delta Parallel Robot
2	Recognition And Localization Of Assembly Robot Based On Mask R-CNN
3	Research On Key Technologies Of Face Recognition Under Mask Occlusion Based On Deep Learning
4	Research On Visual Human Object Detection And Tracking Technology Based On Deep Learning
5	Research On The Visual Detection And The Bahavior Recognition Technology Based On Deep Learning
6	Research Of Visual Odometry Technology Based On Unsupervised Deep Learning
7	Research On Visual Perception Algorithm Of Mobile Robot Based On Deep Learning
8	Small Robot Ladder Environment, Visual Navigation Research
9	Research On Visual Tracking Technology Of Mobile Robot With Binocular Vision System
10	Research On Visual Privacy Protection Technology And System Of Service Robot