Research On Manipulator Grasping Method Based On Reinforcement Learning And Meta-learning

Posted on:2023-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:M J Li

Full Text:PDF

GTID:2568306836469734

Subject:Instrument Science and Technology

Abstract/Summary:

PDF Full Text Request

Manipulator has a wide range of applications in industry and services.Grasping is an important skill of manipulator and a research hotspot in the field of robotics learning.The grasping methods based on deep reinforcement learning can complete end-to-end grasping through autonomous learning,but such methods have the problem of low learning efficiency..Aiming at the problems existing in deep reinforcement learning in manipulator grasping,this paper proposes a manipulator grasping method based on deep reinforcement learning and meta-learning,which includes three aspects: increasing positive reward,learning induction bias,and reducing task complexity to improve learning efficiency.The main research contents of this paper are as follows:(1)Use Deep Deterministic Policy Gradient(DDPG)and sparse reward function to train a manipulator to learn three basic skills in grasping of approaching,grabbing,and placing,as well as2 D planar grasp,and introduce the Hindsight experience replay(HER)algorithm to increase the density of positive rewards in trajectory,solve the problem of low learning efficiency and nonlearning caused by sparse rewards,and significantly improve policy convergence speed and performance.(2)Aiming at the problem of low learning efficiency due to the need to relearn new approach skills,use Meta Q-learning(MQL)to learn effective inductive biases from related approaching tasks,and apply the learned inductive biases to new approaching tasks.The sample size required to achieve the same convergence performance is only 23% of the previous one,and ablation experiments show that obtaining the trajectory context variables is the key to learning the inductive bias.(3)By decomposing the grasping task into three sub-tasks of approaching,grabbing,and placing,the task complexity is reduced,and both the high-level policy and the low-level policy are learned hierarchically and independently.On the basis of the learned low-level policies for grasping subtasks,the Asynchronous Advantage Actor-Critic(A3C)algorithm is used to learn high-level policy for choreographing the execution order of subtasks.Experiments show that the problem of non-learning of end-to-end training is solved through the hierarchical training strategy,and the success rate of the grasping strategy is better than that of the 2D planar grasp strategy.

Keywords/Search Tags:

manipulator grasping, deep reinforcement learning, sparse reward, meta-learning, hierarchical reinforcement learning

PDF Full Text Request

Related items

1	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
2	Feature Extraction In Deep Reinforcement Learning And Countermeasures For Sparse Reward
3	Research And Implementation Of Sparse Reward Algorithm Based On Reinforcement Learning For Virtual Shooting Scenes
4	Research On Active SLAM Algorithm Based On Meta-reinforcement Learning
5	Research And Implementation On Hierarchical Reinforcement Learning
6	Research On Sparse Reward Based On Reinforcement Learning
7	Research On Sample-efficient Deep Reinforcement Learning Methods
8	Research Of Reinforcement Learning In Parameterized Action Space For Sparse Reward
9	Motion Control Method Of Underwater Manipulator Based On Deep Reinforcement Learning
10	Research On Target Grasping Policy Of Robotic Arm Based On Deep Reinforcement Learning