Font Size: a A A

Deep Reinforcement Learning With Experience Replay

Posted on:2021-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2428330605476889Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the field of machine learning,deep reinforcement learning algorithms are widely studied in recent years.The experience replay mechanism is an important technology in deep reinforcement learning algorithms.How to improve the utilization rate of samples and solve the inherent defects of experience replay is a hot issue in the field of deep reinforcement learning.Traditional experience replay mechanism uses an experience replay mechanism that uniformly samples transitions,resulting in low sample utilization rate and unsatisfactory agent performance.This article focuses on improving the efficiency of using samples in a large-scale sample space,and proposes a series of improved deep reinforcement learning algorithms based on experience replay to slove the problems brought by experience replay.The main research content include the following three parts:i.A linear dynamic frame skipping method is proposed.The frame skip rate of each action is not fixed,but increases linearly with the network output Q value.This makes the frame skip rate of each action determined by the agent according to the current state and the importance of the action,making the frame skip rate a parameter that can be dynamically learned based on the action value.In addition,when taking samples from the experience pool to train the network,we consider the effect of the frame skip rate of the action on the sample priority,that is,the priority of the sample is determined by the time difference error of the sample and the sample frame skip rate.Based on the above improvements,a deep double-Q network algorithm based on linear dynamic frame skipping and improved priority experience replay is proposed.The paper verifies the performance of the algorithm in a series of Atari 2600 games to illustrate the effectiveness of the algorithm.ii.A mean-based asynchronous advantage actor-critic algorithm is proposed,which is called Averaged-A3C.The Averaged-A3C algorithm replaces the traditional experience replay mechanism by using asynchronous methods,so that deep reinforcement learning algorithm no longer needs to store a large number of training samples,saves the overhead of storage resources,and can use the same strategy of reinforcement learning methods.The algorithm updates the policy and value function using the advantage function of action.The idea of using the average value greatly reduces the variance calculated by the advantage function.In the Atari 2600 and MuJoCo environments,some games are used to evaluate the performance of the algorithm.Experimental results show that compared with the original A3C algorithm,Averaged-A3C algorithm effectively improves the performance of the agent and the stability of the training process.iii.A planning model based on generative adversarial network called GBPM is proposed.This model utilizes the powerful representation capabilities of generative adversarial network to obtain more accurate environmental model and improve planning capabilities for experience replay.The GBPM can act as a role for experience replay so that it can be applied to both model-based and model-free methods.During training,the GBPM is trained using real transfer samples experienced by the agent and the agent can utilize GBPM to produce simulated experience or trajectories used to improve the learned policy.This paper has integrated the GBPM into some deep reinforcement learning methods effectively,like deep Q-network,actor-critic and so on.These methods were evaluated using the GBPM module in the Atari 2600 game and maze problem,and experiments show that the GBPM model can effectively improve the performance of these algorithms.In the deep reinforcement learning algorithm based on value function,policy,model and plociy search,this paper studies how to improve the shortcomings of the existing experience replay mechanism and improve the performance of the algorithm.
Keywords/Search Tags:deep reinforcement learning, experience replay, frame skip rate, asynchronous actor critic, generative adversarial network
PDF Full Text Request
Related items