Asynchronous Deep Reinforcement Learning With Multiple Gating Mechanisms

Posted on:2019-02-09

Degree:Master

Type:Thesis

Country:China

Candidate:J Xu

Full Text:PDF

GTID:2428330545951221

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning(DRL)combines deep learning and reinforcement learning and it uses the perception ability of deep learning and the decision-making ability of reinforcement learning to enable agents to perform learning directly from the raw data.However,DRL algorithms depend heavily on computing resources,and special hardware is needed to accelerate the training of DRL algorithms,such as graphics processing unit and tensors processing unit.Although the recurrent neural networks can effectively deal with tasks that have dependencies between among states at different time steps,the required computing time is much longer than that of the feed-forward networks.In addition,although DRL algorithms have achieved remarkable results in the areas of robot control and game,its instability is still very serious.This paper focuses on the training time,the performance and the stability of DRL algorithms.Multiple gating mechanisms,skip connection and adaptive clipping region are introduced into DRL algorithms.This paper proposes asynchronous deep reinforcement learning with multiple gating mechanisms.The main research content can be summarized as the following three parts:(1)Asynchronous advantage actor-critic algorithm with multiple gating mechanisms(A3C-MGM).When DRL algorithms only use a feed-forward neural network,it cannot handle the tasks that have dependencies between among states at different time steps.Although the recurrent neural networks can memorize the dependence information between among states at different time steps through its recurrent connection structure,it cannot use the paralleled computing technology to speed up the training process of the DRL algorithms and thus requires more training time.To adjust this problem,multiple gating mechanisms are introduced into asynchronous DRL algorithms,and an asynchronous advantage actor-critic algorithm with multiple gating mechanisms is proposed.We analyze the advantages of multiple gating mechanisms and verify the effectiveness of the algorithm via several complex video games.(2)Asynchronous advantage actor-critic algorithm with skip connection(A3C-skip).The DRL algorithms make the agent more effective in complex state tasks by utilizing the deep neural network's ability to effectively recognize high-dimensional state space.However,asynchronous DRL algorithms with multiple gating mechanisms using only the feed-forward network may cause the agent to behave unstably during the learning process,due to the misidentification of the state characteristics.To adjust this problem,the asynchronous advantage actor-critic algorithm with skip connection is proposed.The effectiveness of the skip connection is theoretically analyzed,and the effectiveness of the algorithm is verified via several complex video games.(3)The above two kind of researches are based on a deep neural network model.The training effect and stability of DRL algorithms are not only closely related to the deep neural network model but also have a close relationship with the reinforcement learning algorithm.For this reason,the adaptive clipping region is introduced into DRL,and a DRL algorithm based on proximal policy optimization with the adaptive region(PPO-AR)and PPO-AR with multiple gating mechanisms(PPO-AR-MGM)are proposed,and the effectiveness of the algorithm is verified via experiments.

Keywords/Search Tags:

deep reinforcement learning, asynchronous methods, multiple gating, skip connection, policy optimization

PDF Full Text Request

Related items

1	Deep Reinforcement Learning With Experience Replay
2	Research On Multiagent Policy Optimization Based On Deep Reinforcement Learning
3	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
4	Asynchronous Deep Reinforcement Learning With Attention Mechanisms
5	Robotic Intelligent Grasping Control Technology Based On Deep Reinforcement Learning
6	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
7	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
8	Gait Analysis Of Quadruped Robot Based On Deep Reinforcement Learning
9	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
10	Self Learning Control Of Mechanical Arm Based On Reinforcement Learning