Font Size: a A A

Research On Manipulator Grasping Method Based On Reinforcement Learning And Meta-learning

Posted on:2023-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:M J LiFull Text:PDF
GTID:2568306836469734Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
Manipulator has a wide range of applications in industry and services.Grasping is an important skill of manipulator and a research hotspot in the field of robotics learning.The grasping methods based on deep reinforcement learning can complete end-to-end grasping through autonomous learning,but such methods have the problem of low learning efficiency..Aiming at the problems existing in deep reinforcement learning in manipulator grasping,this paper proposes a manipulator grasping method based on deep reinforcement learning and meta-learning,which includes three aspects: increasing positive reward,learning induction bias,and reducing task complexity to improve learning efficiency.The main research contents of this paper are as follows:(1)Use Deep Deterministic Policy Gradient(DDPG)and sparse reward function to train a manipulator to learn three basic skills in grasping of approaching,grabbing,and placing,as well as2 D planar grasp,and introduce the Hindsight experience replay(HER)algorithm to increase the density of positive rewards in trajectory,solve the problem of low learning efficiency and nonlearning caused by sparse rewards,and significantly improve policy convergence speed and performance.(2)Aiming at the problem of low learning efficiency due to the need to relearn new approach skills,use Meta Q-learning(MQL)to learn effective inductive biases from related approaching tasks,and apply the learned inductive biases to new approaching tasks.The sample size required to achieve the same convergence performance is only 23% of the previous one,and ablation experiments show that obtaining the trajectory context variables is the key to learning the inductive bias.(3)By decomposing the grasping task into three sub-tasks of approaching,grabbing,and placing,the task complexity is reduced,and both the high-level policy and the low-level policy are learned hierarchically and independently.On the basis of the learned low-level policies for grasping subtasks,the Asynchronous Advantage Actor-Critic(A3C)algorithm is used to learn high-level policy for choreographing the execution order of subtasks.Experiments show that the problem of non-learning of end-to-end training is solved through the hierarchical training strategy,and the success rate of the grasping strategy is better than that of the 2D planar grasp strategy.
Keywords/Search Tags:manipulator grasping, deep reinforcement learning, sparse reward, meta-learning, hierarchical reinforcement learning
PDF Full Text Request
Related items