Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space

Posted on:2021-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:H M Chen

Full Text:PDF

GTID:2428330605976792

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep reinforcement learning algorithms have achieved impressive performance on many reinforcement learning tasks.Deep reinforcement learning algorithms combine the representational capabilities of deep learning and the autonomous decision-making capabilities of reinforcement learning.The continuous action space task is an important direction for deep reinforcement learning research,and it is generally solved by a deterministic actor-critic architecture.But the deterministic actor-critic architecture has problems such as inappropriate exploration methods,unstable learning,and maximizing bias.Aiming at these problems,this article proposes the following research contents::i.The deep deterministic actor-critic algorithm will perform unstable on some continuous tasks,and the exploration method implemented by the algorithm by introducing external noise into the action space is blind,so the agent cannot quickly get good experience to support their learning from the exploration.In response to these problems,this paper proposes a experience guided deep deterministic actor-critic algorithm with multi-actor.The new algorithm does not use external noise sources,but uses a mechanism based on excellent experience to guide the learning and actions of agents,making the agent more inclined to choose a trajectory with high returns.The algorithm also uses a multi-actor-critic mechanism to alleviate the instability of a single network learning.Finally,experiments show that the new algorithm can achieve better results in some continuous control tasks.ii.Aiming at the problems of instability and improper exploration of deep deterministic actor-critic,a self-guided deep deterministic actor-critic algorithm with multi-actor combined with generative adversarial network structure was proposed.The algorithm uses a multi-actor-critic structure to alleviate the fluctuations of a single network learning.The self-guided mechanism uses the generator in the generation adversarial network as the guiding network to guide the agent learning,and the discriminator constitutes the subjective reward,thus guiding the agent to choose a trajectory with high returns.Finally,it is verified on a series of complex continuous tasks to illustrate the effectiveness of the model.iii.The method of updating the value function of the Q-learning algorithm will cause the maximum bias phenomenon,and a similar situation will occur in the deterministic actor-critic algorithm.In order to reduce the impact of the maximum bias on the performance of the algorithm,a twin delayed experience guided deep deterministic actor-critic with multi-actor is proposed.The algorithm uses a truncated double critic mechanism,a delayed update policy and target network mechanism.Experimental results show that the new algorithm performs better on multiple tasks.

Keywords/Search Tags:

reinforcement learning, deep reinforcement learning, deterministic actor-critic, guiding network, generative adversarial network

PDF Full Text Request

Related items

1	Deep Reinforcement Learning With Experience Replay
2	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
3	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
4	Aero-engine Intelligent Control Based On Reinforcement Learning
5	Research On Three Key Problems In Reinforcement Learning
6	Exdloratory Action Correction Algorithm Based On Actor-Critic
7	Research On Multi-agent System Decision Algorithm Based On Deep Reinforcement Learning
8	Safety And Security Analysis On Asynchronous Advantage Actor-Critic Model
9	Research On Target Tracking Algorithm Based On Deep Learning And Reinforcement Learning
10	Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm