Font Size: a A A

Research On Multi-agent Reinforcement Learning

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:W DuFull Text:PDF
GTID:2428330626958725Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL)is often considered to be general formalization of decision-making tasks and is closely linked to dynamic programming,and game theory.Multi-agent reinforcement learning is an important branch in the field of multi-agent system research.It applies reinforcement learning technology and game theory to multiagent systems,enabling multiple agents to complete more complicated tasks through interaction and decision-making in higher-dimensional and dynamic real scenes.With the rapid development and wide application of deep neural network,more and more traditional reinforcement learning algorithms are combined with it,forming deep reinforcement learning methods to solve the problems of higher dimensions and more complex scenes in the real world.The application of deep reinforcement learning in multi-agent system has gradually become the most cutting-edge research hotspot.The research in this paper is as follows:1.In this paper,the latest research progress and development of multi-agent reinforcement learning is studied,especially the application of deep reinforcement learning in multi-agent system.The theoretical background of multi-agent reinforcement learning is introduced and the learning goal and classical algorithm of multi-agent reinforcement learning proposed in the literature are sumarized.The latest development of multi-agent deep reinforcement learning is reviewed,which classifies the most advanced algorithms from different angles such as scalability,non-stationary,particial observability and communication learning.Finally,the application prospect and other research directions of multi-agent reinforcement learning are summarized.2.In this paper,algorithms based on the multi-agent deep deterministic policy gradient model is studied.The multi-agent deep deterministic policy gradient model is based on the actor-critic framework,while the agent trained by the actor critic network tends to overestimate the value function,which leads to poor optimal policy of learning and the non-convergence of agent behavior.In order to solve this problem,this paper adopt a centralized training framework with decentralized execution,and proposed a new algorithm,namely,multi-agent double deep deterministic policy gradient(MA3DPG)algorithm.First,inspired by the traditional reinforcement learning method Double Q learning,we use a double critic network structure,by taking the minimum value between pairs of independent training networks to reduce overestimation,so that the agent can learn the optimal policy.Second,we adopt a delay policy update,so that the policy network is updated at a lower frequency than the value network,so that the error is minimized before the policy update,solve the problem that the agent behavior is not easy to converge.At the same time,we accelerated our training with priority batch processing.We compare our algorithm with Multi-agent Double Deep Deterministic Policy Gradient(MADDPG),a state of art approach,on the Open AI gym task and show our method outperforms it.3.In this paper,the asynchronous reinforcement learning algorithm based on the improved parallel particle group algorithm is studied.The asynchronous learning method is gradually applied to the multi-agent system.As an important machine learning method,reinforcement learning plays a more and more important role in practical application.In recent years,many scholars have studied asynchronous reinforcement learning algorithm,and achieved remarkable results in many applications.However,when using existing asynchronous reinforcement learning to solve problems,due to the limited search scope of agents,it often fails to reduce the running episodes of algorithms.At the same time,the traditional model-free reinforcement learning algorithm does not necessarily converge to the optimal solution,which may lead to some waste of resources in practical applications.In view of these problems,we apply Particle swarm optimization(PSO)algorithm to asynchronous reinforcement learning algorithm to search for the optimal solution.First,we propose a new asynchronous variant of PSO algorithm.Then we apply it into asynchronous reinforcement learning algorithm and proposed a new asynchronous reinforcement learning algorithm named Sarsa algorithm based on backward Q-learning and asynchronous particle swarm optimization(APSO-BQSA).Finally,we verify the effectiveness of the asynchronous PSO and APSO-BQSA algorithm proposed in this paper through experiments.This paper has 19 figures,7 tables and 86 references.
Keywords/Search Tags:reinforcement learning, multi-agent system, deep learning, asynchronous method
PDF Full Text Request
Related items