Font Size: a A A

Asynchronous Deep Reinforcement Learning With Multiple Gating Mechanisms

Posted on:2019-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2428330545951221Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning(DRL)combines deep learning and reinforcement learning and it uses the perception ability of deep learning and the decision-making ability of reinforcement learning to enable agents to perform learning directly from the raw data.However,DRL algorithms depend heavily on computing resources,and special hardware is needed to accelerate the training of DRL algorithms,such as graphics processing unit and tensors processing unit.Although the recurrent neural networks can effectively deal with tasks that have dependencies between among states at different time steps,the required computing time is much longer than that of the feed-forward networks.In addition,although DRL algorithms have achieved remarkable results in the areas of robot control and game,its instability is still very serious.This paper focuses on the training time,the performance and the stability of DRL algorithms.Multiple gating mechanisms,skip connection and adaptive clipping region are introduced into DRL algorithms.This paper proposes asynchronous deep reinforcement learning with multiple gating mechanisms.The main research content can be summarized as the following three parts:(1)Asynchronous advantage actor-critic algorithm with multiple gating mechanisms(A3C-MGM).When DRL algorithms only use a feed-forward neural network,it cannot handle the tasks that have dependencies between among states at different time steps.Although the recurrent neural networks can memorize the dependence information between among states at different time steps through its recurrent connection structure,it cannot use the paralleled computing technology to speed up the training process of the DRL algorithms and thus requires more training time.To adjust this problem,multiple gating mechanisms are introduced into asynchronous DRL algorithms,and an asynchronous advantage actor-critic algorithm with multiple gating mechanisms is proposed.We analyze the advantages of multiple gating mechanisms and verify the effectiveness of the algorithm via several complex video games.(2)Asynchronous advantage actor-critic algorithm with skip connection(A3C-skip).The DRL algorithms make the agent more effective in complex state tasks by utilizing the deep neural network's ability to effectively recognize high-dimensional state space.However,asynchronous DRL algorithms with multiple gating mechanisms using only the feed-forward network may cause the agent to behave unstably during the learning process,due to the misidentification of the state characteristics.To adjust this problem,the asynchronous advantage actor-critic algorithm with skip connection is proposed.The effectiveness of the skip connection is theoretically analyzed,and the effectiveness of the algorithm is verified via several complex video games.(3)The above two kind of researches are based on a deep neural network model.The training effect and stability of DRL algorithms are not only closely related to the deep neural network model but also have a close relationship with the reinforcement learning algorithm.For this reason,the adaptive clipping region is introduced into DRL,and a DRL algorithm based on proximal policy optimization with the adaptive region(PPO-AR)and PPO-AR with multiple gating mechanisms(PPO-AR-MGM)are proposed,and the effectiveness of the algorithm is verified via experiments.
Keywords/Search Tags:deep reinforcement learning, asynchronous methods, multiple gating, skip connection, policy optimization
PDF Full Text Request
Related items