Font Size: a A A

Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330614961166Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In order to solve the problem that the learning effect of priority experience playback algorithm in deep reinforcement learning is easily affected by the outliers of time series error,the experience of neglecting immediate return and small time difference error,and the poor stability,the improved algorithm of priority experience playback reinforcement learning is proposed and applied to the depth certainty strategy search algorithm and depth Q network algorithm respectively.Aiming at the problems of low experience utilization and poor performance in Depth Deterministic Policy Gradient algorithm,a new algorithm based on DDPG(Deep Deterministic Policy Gradient with Composite Priority Experience Reply,DDPG-CPER)is proposed.Using the immediate return of samples and the time difference error of samples to build the priority to sort the samples respectively,and then the experience is composite average sorted and the priority is obtained based on the priority mechanism to sample the experience.The samples are used to train the learning network,and the effectiveness of the algorithm is verified by comparative experiments.In view of the fact that the Deep Q Network algorithm ignores the experience of low immediate return and small time difference error,and the overuse of experience leads to the over fitting phenomenon of the network,this paper proposes a Deep Q Network algorithm with exit mechanism for second sampling prior experience Playback(Deep Q Network Algorithm with Exit Mechanism for Second Sampling Prior Experience Playback,DQN-SSPE)?First of all,we use the immediate return of samples and the time difference error of samples to build the priority to sort the samples and extract a certain number of samples,then reverse the order of sequence numbers to build the priority to extract a certain number of samples.In the algorithm,the total number of times each sample is limited to a certain number,and experience can not be resampled if the number of times is more than the limit.The above samples are used to train the learning network,and the effectiveness of the algorithm is verified by the corresponding experiments.
Keywords/Search Tags:Reinforcement Learning, Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Deep Q Network, Experience Playback Mechanism
PDF Full Text Request
Related items