Demonstrations With Dynamic Bonus For Deep Reinforcement Learning

Posted on:2021-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Z H Duan

Full Text:PDF

GTID:2518306464482864

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Deep Reinforcement Learning(DRL),computers have surpassed humans in many fields like GO and DOTA2.However,problems such as data inefficiency,large amount of calculation,and long training time for an agent have been plagued by the further development of DRL.Therefore,improving the utilization of data and reducing the training time have very important research significance in the field of DRL.Some researchers use transfer learning to help DRL training.Transfer learning is a method of using source domain knowledge to help target domain training.However,how to select the source domain is still an open question.Using demonstrations is another way to speed up training.This method only uses a relatively small number of demonstrations but greatly increases the learning speed of DRL.However,the existing methods based on demonstrations either have too much additional calculation requirements,or it is difficult to achieve a balance between learning from self-generated samples and demonstrations.To this end,we proposed a method based on dynamic bonus of demonstrations.Compared with the existing methods,our method has the following two improvements:(1)Proposed the evaluation mechanism in pre-training stage which can allocate the bonus of demonstrations based on its' contribution in pre-training.Compared to assigning the same initial bonus to all demonstrations,our method can distinguish the importance of the demonstrations better and improve the agent's utilization of the demonstrations.(2)This study devises a demonstration-based learning method that dynamically adjusts the bonus of demonstrations according to the performance of the agent.Compared with the fixed bonus,the dynamic bonus effectively avoids the agent's overestimation of the demonstrations,allowing the agent to achieve a balance in the learning of demonstrations and self-generated samples.We compared our method to three other algorithms on Atari benchmark,and the ablation experiment is also been considered.Compared with the existing methods,the experimental results indicate that our method improved the average score by up to 20% without addingadditional calculations.In addition,the dynamic bonus proposed in this paper effectively keeps a balance between learning from self-generated samples and demonstrations,reduces the risk of over-learning demonstrations,and improves the final performance of the agent.

Keywords/Search Tags:

Reinforcement learning, Demonstrations, transfer learning, Deep learning

PDF Full Text Request

Related items

1	Deep Reinforcement Learning Based On Preferred Samples And Demonstrations
2	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
3	Supervised Reinforcement Learning:methods And Applications
4	Research On Energy Optimization Of Wireless Access Network Based On Transfer Reinforcement Learning
5	Optimization Design For Deep Belief Network And Its Applications
6	Research On Resource Assignment Problem Based On Deep Reinforcement Learning
7	Reinforcement Learning Based On Spectral Graph Theory
8	Research On Security Deep Reinforcement Learning Based On Experiences
9	Research And Implementation Of Stock Quantitative Trading Algorithm Based On Deep Reinforcement Learning
10	Research On Deep Learning-Based Representation Learning Algorithms