Font Size: a A A

The Research On The Application Of Deep Learning In Reinforcement Learning

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhanFull Text:PDF
GTID:2428330623468212Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep learning(DL),a subfield of machine learning,has developed in many research fields.DL has been applied to many tasks including speech recognition,computer vi-sion,natural language processing,etc.Deep reinforcement learning(DRL)combines deep learning with reinforcement learning that enables an agent to learn in an interactive en-vironment by trial and error.Today,DRL has been applied on areas like robotics,video games,finance,etc.Replay buffer is applied in many DRL algorithms.Because of the computer hard-ware,the memory budget of replay buffer is limited.Conventionally,when the memory budget is exhausted,the oldest transition in replay buffer will be forgotten(discarded),in order to get memory to store the new transition.However,this approach does not take account of the different importance of transitions,and the important transitions will be forgotten before they are studied sufficiently.This causes that the utilization of replay buffer is low.For this problem,we propose an algorithm called priority forgetting experience re-play(PFER),which determines forgetting order based on the importance of transitions,instead of how long they have been in replay buffer.PFER gives every transition in replay buffer a forgetting priority weight which is calculated by the temporal-difference error(TD error)of the transition.The transition with high TD error should be retained to get more study,because they are not studied sufficiently.When a new transition is coming and there is no memory in replay buffer,PFER chooses a transition to forget by the forgetting priority weight,which will retain more important transitions and improve their utilization.In the experiment of MuJoCo environments,DDPG with PFER can improve learning ef-ficiency and get more rewards from environments than original DDPG and DDPG with priority experience replay(PER).For the calculation of forgetting priority weight,we also propose an algorithm called incremental priority forgetting experience replay(IPFER)which uses TD error and the number of times it has been sampled to measure the importance of the transition.IPFER uses an incremental approach to update the forgetting priority weight.Every time the transition is sampled and studied,its forgetting priority weight will increase,and the in-creased value is related to the TD error of the transition.The more times it is sampled,the larger its forgetting priority weight will be and the more likely it is to be forgotten.In the experiment of MuJoCo environments,DDPG with IPFER performs better than original DDPG and DDPG with PER,also better than DDPG with PFER,getting more rewards in environments.
Keywords/Search Tags:Deep Learning, Deep Reinforcement Learning, Replay Buffer, PFER, Forgetting
PDF Full Text Request
Related items