Research On Improvement Method Of Experience Playback Mechanism In Deep Reinforcement Learning

Posted on:2021-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y Liu

Full Text:PDF

GTID:2428330614961166

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In order to solve the problem that the learning effect of priority experience playback algorithm in deep reinforcement learning is easily affected by the outliers of time series error,the experience of neglecting immediate return and small time difference error,and the poor stability,the improved algorithm of priority experience playback reinforcement learning is proposed and applied to the depth certainty strategy search algorithm and depth Q network algorithm respectively.Aiming at the problems of low experience utilization and poor performance in Depth Deterministic Policy Gradient algorithm,a new algorithm based on DDPG(Deep Deterministic Policy Gradient with Composite Priority Experience Reply,DDPG-CPER)is proposed.Using the immediate return of samples and the time difference error of samples to build the priority to sort the samples respectively,and then the experience is composite average sorted and the priority is obtained based on the priority mechanism to sample the experience.The samples are used to train the learning network,and the effectiveness of the algorithm is verified by comparative experiments.In view of the fact that the Deep Q Network algorithm ignores the experience of low immediate return and small time difference error,and the overuse of experience leads to the over fitting phenomenon of the network,this paper proposes a Deep Q Network algorithm with exit mechanism for second sampling prior experience Playback(Deep Q Network Algorithm with Exit Mechanism for Second Sampling Prior Experience Playback,DQN-SSPE)?First of all,we use the immediate return of samples and the time difference error of samples to build the priority to sort the samples and extract a certain number of samples,then reverse the order of sequence numbers to build the priority to extract a certain number of samples.In the algorithm,the total number of times each sample is limited to a certain number,and experience can not be resampled if the number of times is more than the limit.The above samples are used to train the learning network,and the effectiveness of the algorithm is verified by the corresponding experiments.

Keywords/Search Tags:

Reinforcement Learning, Deep Reinforcement Learning, Deep Deterministic Policy Gradient, Deep Q Network, Experience Playback Mechanism

PDF Full Text Request

Related items

1	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
2	Study Of Robot Arm Control Based On Deep Reinforcement Learning
3	Deep Deterministic Policy Gradient Based On Entropy Regularization And Regular Update
4	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
5	The Research Of Algorithms And Architectures On Deep Q-Network
6	Deep Reinforcement Learning Based Mobile Robot Path Planning
7	Research Of Deep Reinforcement Learning In Real-Time Strategy Games
8	Research On Off-policy Reinforcement Learning Algorithm
9	Research On LTE Radio Resource Allocation Algorithm Based On Deep Reinforcement Learning
10	Exploration Strategy Of Deterministic Policy In Deep Reinforcement Learning