Font Size: a A A

The Grasp Manipulation Strategy Of Robotic Arm Based On Deep Reinforcement Learning

Posted on:2021-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X K FuFull Text:PDF
GTID:2428330602486022Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Manipulators are widely used in industrial production.Most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results,features of the target object,and can only grasp specific types of objects.These algorithms cannot redepoly effectively when the scenario of task or the target objects changes.This thesis mainly studies the general grasping decision process in the simulation environ-ment of the robotic arm.The general grasping process defined in this thesis mainly meets the following four constraints:·The algorithm should be able to grasp arbitraty object in the specified task scenario.·The grasp algorithm only takes the observation of sensors as the input information,such as the image of the camera,the state of the robot arm and the feedback information of the gripper.·Dense reward requires extra information such as the real coordinates or the target object,which is difficult ot to be obtained directly on the physical platform.So the algorithm uses sparse reward feedback,which means the positive feedback can only be obtained with a successful grasp.·Continuous motion in the Cartesian coordinate system of the end-effector is used as the output of the algorithm.This thesis mainly uses the method of deep reinforcement learning to study the end-to-end de-cision process of general object grasping.Under the above constraints,the reinforcement learning method mainly encounters four problems:long interaction time,low sampling efficiency,insuffi-cient sample utilization,and the limited explorability.The research in this thesis focuses on these four issues,and the main contributions of this thesis can be summarized as follows:1.In this thesis,in order to overcome the problems of long interaction time and low sampling efficiency,two improvements are introduced based on the two basic algorithms of DQN and DDPG:a guided strategy of the controller as well as the distributed training.Due to the con-straints of high-dimensional states,sparse rewards,and continuous actions,it is difficult to obtain effective feedback when training with the original DQN or DDPG algorithms,which causes inefficient training process.In this thesis,a guided strategy with the success rate of about 10%is designed to replace the completely random exploration,which improves the sampling efficiency and can achieve a success rate of about 50%.Aiming at solving the problem of long interaction time,this thesis implements distributed deployment of the above two types of algorithms with the framework of multiple interactive nodes and single learn-ing node.While improving the sampling efficiency,it also increases a certain exploratory performance,and the grasp performance of the algorithm can be improved to 55%.2.This thesis proposes an integrated deterministic Policy algorithm(BC-EDDPG)based on imitation learning.The algorithm improves the training efficiency through imitation learn-ing of expert demonstration data,and also improves the sample utilization efficiency.The part of policy network integration can balance the problem of inadequate exploration brought by imitation learning.With this improvement,the algorithm can finally achieve a success rate of about 70%when grasping new objects.At the same time,the interactive data of the training process is reduced from more than 100,000 to about 20,000,which lays the foundation for subsequent experiments on the physical platform.
Keywords/Search Tags:grasp manipulation, deep reinforcement learning, imitation learning, integrated algorithms
PDF Full Text Request
Related items