Research On Experience Replay In Deep Reinforcement Learning

Posted on:2022-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2518306740982919

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Experience replay(ER)is an important part of deep reinforcement learning(DRL),which learns the retained experience in replay buffer through repeated sampling to optimize the target policy.At present,there are two problems in experience replay: 1)Experience retention uses full retention or first in first out(FIFO)replay buffer,needs to generate a large number of samples by interacting with the environment,thus the learning speed and the sample utilization needs to be improved;2)prioritized sampling breaks the original distribution in replay buffer,and increases the distance between state distribution of the experience and state distribution of the policy,resulting in high return variance and poor stability of the algorithm.(1)Aiming at the problems of slow learning speed and low sample utilization,in view of the difference of the state distribution in different experience retention,this thesis proposes Dual Replay Buffer(DRB)by considering the difference of state distribution of the experiences.DRB adopts full retention and FIFO replay buffer,which respectively retains the experience similar to the global state distribution of the environment or state distribution of the policy,update network with mixed experience sampling,so as to speeds up policy learning and improves sample utilization.(2)Aiming at the problem of unstable performance and large return variance of the algorithm,this thesis proposes Prioritized Dual Replay Buffer(PDRB)in view of the distance between state distribution of the policy and state distribution of the experience.PRDB combines prioritized sampling and experience filtering based on DRB,measures the priority and samples according to the time difference error,filters out the experience which is far from the state distribution of the policy,and update network with piecewise loss function,so as to improve the stability of the algorithm.This thesis compares the learning speed and the return of different experience replay methods in the control tasks of Gym and Pybullet environment.The experimental results show that compared with the FIFO and full retention experience replay,the learning speed of the DRL algorithm based on DRB is greatly improved,and the number of episodes required to achieve 80% of the maximum return is reduced by about 33.17%.Compared with the prioritized experience replay,remember and forget experience replay,the average return of the DRL algorithm based on PRDB is increased by12.25% and the variance of average return is reduced by 45.85%,the performance and stability of the algorithm are significantly improved.

Keywords/Search Tags:

Deep reinforcement learning, experience replay, state distribution, prioritized sampling

PDF Full Text Request

Related items

1	Research On Experience Replay Method For Deep Reinforcement Learning
2	Research And Implementation Of Recommendation Algorithm Based On Deep Reinforcement Learning
3	Research On Optimization Method Of Deep Reinforcement Learning Experience Replay
4	Deep Reinforcement Learning With Experience Replay
5	Research On Personalized Recommendation Methods Based On Deep Learning
6	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
7	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
8	Research Of Multi-agent Cooperation Based On Deep Reinforcement Learning
9	Research On Goal-oriented Model-based Reinforcement Learning
10	Research On Security Deep Reinforcement Learning Based On Experiences