Font Size: a A A

Demonstrations With Dynamic Bonus For Deep Reinforcement Learning

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z H DuanFull Text:PDF
GTID:2518306464482864Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Deep Reinforcement Learning(DRL),computers have surpassed humans in many fields like GO and DOTA2.However,problems such as data inefficiency,large amount of calculation,and long training time for an agent have been plagued by the further development of DRL.Therefore,improving the utilization of data and reducing the training time have very important research significance in the field of DRL.Some researchers use transfer learning to help DRL training.Transfer learning is a method of using source domain knowledge to help target domain training.However,how to select the source domain is still an open question.Using demonstrations is another way to speed up training.This method only uses a relatively small number of demonstrations but greatly increases the learning speed of DRL.However,the existing methods based on demonstrations either have too much additional calculation requirements,or it is difficult to achieve a balance between learning from self-generated samples and demonstrations.To this end,we proposed a method based on dynamic bonus of demonstrations.Compared with the existing methods,our method has the following two improvements:(1)Proposed the evaluation mechanism in pre-training stage which can allocate the bonus of demonstrations based on its' contribution in pre-training.Compared to assigning the same initial bonus to all demonstrations,our method can distinguish the importance of the demonstrations better and improve the agent's utilization of the demonstrations.(2)This study devises a demonstration-based learning method that dynamically adjusts the bonus of demonstrations according to the performance of the agent.Compared with the fixed bonus,the dynamic bonus effectively avoids the agent's overestimation of the demonstrations,allowing the agent to achieve a balance in the learning of demonstrations and self-generated samples.We compared our method to three other algorithms on Atari benchmark,and the ablation experiment is also been considered.Compared with the existing methods,the experimental results indicate that our method improved the average score by up to 20% without addingadditional calculations.In addition,the dynamic bonus proposed in this paper effectively keeps a balance between learning from self-generated samples and demonstrations,reduces the risk of over-learning demonstrations,and improves the final performance of the agent.
Keywords/Search Tags:Reinforcement learning, Demonstrations, transfer learning, Deep learning
PDF Full Text Request
Related items