Font Size: a A A

Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:H M ChenFull Text:PDF
GTID:2428330605976792Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep reinforcement learning algorithms have achieved impressive performance on many reinforcement learning tasks.Deep reinforcement learning algorithms combine the representational capabilities of deep learning and the autonomous decision-making capabilities of reinforcement learning.The continuous action space task is an important direction for deep reinforcement learning research,and it is generally solved by a deterministic actor-critic architecture.But the deterministic actor-critic architecture has problems such as inappropriate exploration methods,unstable learning,and maximizing bias.Aiming at these problems,this article proposes the following research contents::i.The deep deterministic actor-critic algorithm will perform unstable on some continuous tasks,and the exploration method implemented by the algorithm by introducing external noise into the action space is blind,so the agent cannot quickly get good experience to support their learning from the exploration.In response to these problems,this paper proposes a experience guided deep deterministic actor-critic algorithm with multi-actor.The new algorithm does not use external noise sources,but uses a mechanism based on excellent experience to guide the learning and actions of agents,making the agent more inclined to choose a trajectory with high returns.The algorithm also uses a multi-actor-critic mechanism to alleviate the instability of a single network learning.Finally,experiments show that the new algorithm can achieve better results in some continuous control tasks.ii.Aiming at the problems of instability and improper exploration of deep deterministic actor-critic,a self-guided deep deterministic actor-critic algorithm with multi-actor combined with generative adversarial network structure was proposed.The algorithm uses a multi-actor-critic structure to alleviate the fluctuations of a single network learning.The self-guided mechanism uses the generator in the generation adversarial network as the guiding network to guide the agent learning,and the discriminator constitutes the subjective reward,thus guiding the agent to choose a trajectory with high returns.Finally,it is verified on a series of complex continuous tasks to illustrate the effectiveness of the model.iii.The method of updating the value function of the Q-learning algorithm will cause the maximum bias phenomenon,and a similar situation will occur in the deterministic actor-critic algorithm.In order to reduce the impact of the maximum bias on the performance of the algorithm,a twin delayed experience guided deep deterministic actor-critic with multi-actor is proposed.The algorithm uses a truncated double critic mechanism,a delayed update policy and target network mechanism.Experimental results show that the new algorithm performs better on multiple tasks.
Keywords/Search Tags:reinforcement learning, deep reinforcement learning, deterministic actor-critic, guiding network, generative adversarial network
PDF Full Text Request
Related items