Font Size: a A A

Deep Reinforcement Learning With Exploratory Noise

Posted on:2020-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2370330578977967Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning algorithms combined with the deep neural network and reinforcement learning methods have made remarkable achievements in the task of simulating real scenes.It makes use of the advanced perception ability of deep neural network and the self-determination ability of reinforcement learning algorithm.Without the need of complex artificial preprocessing of environmental data,the agent can learn autonomously directly from the environment.However,how to maintain the balance of"exploration and exploitation" has become a hot issue in the field of deep reinforcement learning.The traditional deep reinforcement learning algorithm dithering in action space,which leads to lower decision-making efficiency,and the agent cannot make reasonable decisions.This paper focuses on the exploration problem in deep reinforcement learning algorithm under large-scale sample space,combines noise mechanism and deep reinforcement learning,and proposes deep reinforcement learning using exploratory noise.The main research contents can be summarized as the following three parts:i.The deep deterministic policy gradient algorithm establishes the relation between time and action through Ornstein-Uhlenbeck(OU)noise to ensure that the agent has the ability to explore.Although the introduction of OU noise makes the behavior policy random,the agent does not make reasonable use of the consistency and optimality of the action.To solve this problem,we introduce the policy space noise into the deep deterministic policy gradient algorithm,and propose a deep deterministic policy gradient algorithm with the threshold exploration(TE-DDPG).We theoretically analyze the advantages of policy space exploration and design reasonable exploration opportunities and methods.Finally,a series of complex simulation control simulation environment experiments prove the superiority of the algorithm.ii.When the deep reinforcement learning algorithm only uses the feedforward neural network,it is difficult to deal with the correlation between different states.In addition,when using a neural network with a certain depth as a function approximator,a small weight change will make a huge difference in the network output,resulting in a certain deviation of the decision made by the agent.A deep recurrent Q network model with exploratory noise(EN-DRQN)has proposed to solve these problems.We are verified on a series of complicated video games to illustrate the validity of the model.iii.In the trust region policy optimization algorithm,the output of the parameterized policy as the mean of the action,and Gaussian dithering is performed by using a certain variance to enable the agent to have the ability to explore from the environment.However,this type of dithering can only make small oscillations around the action,and its dithering ability is insufficient.As a result,it is difficult to mine useful information in the process of interaction between agents and environments.To solve this problem,we propose trust region policy optimization with adaptive exploratory noise(TRPO-AEN).The algorithm combines the noise of isotropic exploration and the directional scalable noise,and coordinates the exploration between the two through the cumulative reward value of each episode.We use a policy gradient approach to update the parameterized noise policy.A series of experimental results show that the performance of the agent in each episode has been further improved.
Keywords/Search Tags:reinforcement learning, deep reinforcement learning, exploratory noise, policy gradient, trust region policy optimization
PDF Full Text Request
Related items