The Research On The Application Of Deep Learning In Reinforcement Learning

Posted on:2021-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhan

Full Text:PDF

GTID:2428330623468212

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep learning(DL),a subfield of machine learning,has developed in many research fields.DL has been applied to many tasks including speech recognition,computer vi-sion,natural language processing,etc.Deep reinforcement learning(DRL)combines deep learning with reinforcement learning that enables an agent to learn in an interactive en-vironment by trial and error.Today,DRL has been applied on areas like robotics,video games,finance,etc.Replay buffer is applied in many DRL algorithms.Because of the computer hard-ware,the memory budget of replay buffer is limited.Conventionally,when the memory budget is exhausted,the oldest transition in replay buffer will be forgotten(discarded),in order to get memory to store the new transition.However,this approach does not take account of the different importance of transitions,and the important transitions will be forgotten before they are studied sufficiently.This causes that the utilization of replay buffer is low.For this problem,we propose an algorithm called priority forgetting experience re-play(PFER),which determines forgetting order based on the importance of transitions,instead of how long they have been in replay buffer.PFER gives every transition in replay buffer a forgetting priority weight which is calculated by the temporal-difference error(TD error)of the transition.The transition with high TD error should be retained to get more study,because they are not studied sufficiently.When a new transition is coming and there is no memory in replay buffer,PFER chooses a transition to forget by the forgetting priority weight,which will retain more important transitions and improve their utilization.In the experiment of MuJoCo environments,DDPG with PFER can improve learning ef-ficiency and get more rewards from environments than original DDPG and DDPG with priority experience replay(PER).For the calculation of forgetting priority weight,we also propose an algorithm called incremental priority forgetting experience replay(IPFER)which uses TD error and the number of times it has been sampled to measure the importance of the transition.IPFER uses an incremental approach to update the forgetting priority weight.Every time the transition is sampled and studied,its forgetting priority weight will increase,and the in-creased value is related to the TD error of the transition.The more times it is sampled,the larger its forgetting priority weight will be and the more likely it is to be forgotten.In the experiment of MuJoCo environments,DDPG with IPFER performs better than original DDPG and DDPG with PER,also better than DDPG with PFER,getting more rewards in environments.

Keywords/Search Tags:

Deep Learning, Deep Reinforcement Learning, Replay Buffer, PFER, Forgetting

PDF Full Text Request

Related items

1	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
2	Research On Experience Replay Method For Deep Reinforcement Learning
3	Research On Security Deep Reinforcement Learning Based On Experiences
4	Deep Reinforcement Learning With Experience Replay
5	Study Of Robot Arm Control Based On Deep Reinforcement Learning
6	Research On Advanced Deep Reinforcement Learning Algorithm For Image Games
7	Research On Experience Replay In Deep Reinforcement Learning
8	Highly-efficient Robot Self-learning With Deep Reinforcement Learning
9	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
10	Research On Sample Generation And Selection Methods For Deep Reinforcement Learning