Font Size: a A A

Research On Experience Replay In Deep Reinforcement Learning

Posted on:2022-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306740982919Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Experience replay(ER)is an important part of deep reinforcement learning(DRL),which learns the retained experience in replay buffer through repeated sampling to optimize the target policy.At present,there are two problems in experience replay: 1)Experience retention uses full retention or first in first out(FIFO)replay buffer,needs to generate a large number of samples by interacting with the environment,thus the learning speed and the sample utilization needs to be improved;2)prioritized sampling breaks the original distribution in replay buffer,and increases the distance between state distribution of the experience and state distribution of the policy,resulting in high return variance and poor stability of the algorithm.(1)Aiming at the problems of slow learning speed and low sample utilization,in view of the difference of the state distribution in different experience retention,this thesis proposes Dual Replay Buffer(DRB)by considering the difference of state distribution of the experiences.DRB adopts full retention and FIFO replay buffer,which respectively retains the experience similar to the global state distribution of the environment or state distribution of the policy,update network with mixed experience sampling,so as to speeds up policy learning and improves sample utilization.(2)Aiming at the problem of unstable performance and large return variance of the algorithm,this thesis proposes Prioritized Dual Replay Buffer(PDRB)in view of the distance between state distribution of the policy and state distribution of the experience.PRDB combines prioritized sampling and experience filtering based on DRB,measures the priority and samples according to the time difference error,filters out the experience which is far from the state distribution of the policy,and update network with piecewise loss function,so as to improve the stability of the algorithm.This thesis compares the learning speed and the return of different experience replay methods in the control tasks of Gym and Pybullet environment.The experimental results show that compared with the FIFO and full retention experience replay,the learning speed of the DRL algorithm based on DRB is greatly improved,and the number of episodes required to achieve 80% of the maximum return is reduced by about 33.17%.Compared with the prioritized experience replay,remember and forget experience replay,the average return of the DRL algorithm based on PRDB is increased by12.25% and the variance of average return is reduced by 45.85%,the performance and stability of the algorithm are significantly improved.
Keywords/Search Tags:Deep reinforcement learning, experience replay, state distribution, prioritized sampling
PDF Full Text Request
Related items