Many real-world problems,such as urban traffic control and public resource ap-propriation,can be treated as multi-agent problems.In recent years,more and more re-searchers are trying to tackle multi-agent problems with reinforcement learning(RL)al-gorithms.Traditional single-agent reinforcement learning algorithms often fail to learn the cooperation between different agents,which is vital for multi-agent problems.A promising solution is to establish a communication protocol among agents.However,existing approaches often suffer from generalization challenges especially in tasks with partial observation and dynamic variation of agent amount.As a result of that,this thesis focuses on designing suitable algorithms as well as models to address the "Learning-to-communicate" problem in multi-agent system.The main contribution can be summa-ized as follows:1.We propose an Attention-based communication information processing module.This module can describe the relationships between agents dependingg on their observa-tions,then we can derive the weights of different communication information,based on the agentss' relationships.The weights will further decide which agents to communicate with.2.We also propose a novel Vector of Locally Aggregated Descriptors(VLAD)based communication information processing module.This module can learn an eeffec-tive representation of all the agentss' status,and further improvs the cooperation perfor-mance between agents.We evaluate the proposed mothod on two partially observable fully-cooperative multi-agents games,our model is effective and improves the perfor-mance by a large margin over the state-of-the-art methods.3.We modify the VLAD-based communication information processing module.We introduce gate mechanism into the module to make sure our framework can han-dle the task in which agents are not fully-cooperative.We also embed the processing module into the Actor-Critic framework,so that the module can be applied to multi-agent environment with continuous action space.In non-fully-cooperative multi-agents game and continuous action space multi-agents game,our models also outperform the baseline models and show excellent generalization performance. |