Font Size: a A A

Cooperation Promotion Multi-Agent Reinforcement Learning

Posted on:2022-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H LiFull Text:PDF
GTID:1488306773484114Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
From everyday life to global challenges,cooperation problems,where agents seek ways to work together to improve their well-being,are widespread and vary in scale.As AI-powered machines play an increasing role in our lives,it will be essential to empowering them with the capabilities needed to cooperate with others(humans and machines)through algorithms for problems of different scales.Multi-agent reinforcement learning(MARL),a crossover field of reinforcement learning,control theory,game theory,deep learning,and social psychology,has made impressive achievements in solving complex cooperative tasks.In MARL,changes in the external environment of an agent and the rewards obtained no longer depend only on the actions taken by the agent itself but also need to take into account the influence of the actions of other agents.This makes it inevitable that a MARL algorithm that promotes cooperation among agents needs to empower the agent with the capabilities to model other agents.Thus,in this paper,the capabilities to model other agents is categorized from high to low into three levels: behaviors understanding,capabilities understanding,and intentions understanding.Moreover,to achieve a trade-off between training cost and algorithm performance and to highlight the effectiveness of different levels of capabilities in solving cooperation problems,this paper divides multi-agent cooperation problems into three categories from small to considerable difficulty,namely,homogeneous and fixed-number multiagent scenarios,heterogeneous and time-varying multi-agent scenarios,and large-scale multi-agent scenarios.The three scenarios respectively correspond to the above three levels of capabilities,and this paper also abstracts the critical research problems corresponding to each scenario,i.e.,the non-stationarity problem,the relational modeling problem,and the organization control problem.To address the non-stationarity problem in behavioral understanding,this paper introduces a new concept,namely,?-stationarity,to explicitly compute the non-stationarity of the policy sequences generated by the learning process of the agents.Moreover,this paper further proves that the ?-stationarity is bounded by the KL-divergence of the joint policies of all agents.In this paper,we further model the joint policy of all agents as a pairwise Markov random field and propose a message-passing-based mirror descent trust-region decomposition algorithm(MAMT),to estimate the KL-divergence of the joint policy more accurately.Mirror descent trust-region decomposition algorithm can bring significant and stable performance improvements over the baselines in cooperative tasks with different complexity,and the algorithm has good scalability.To address the problem of relational modeling in capabilities understanding,this paper proposes a cooperative MARL algorithm,hierarchical action space representation learning algorithm(SCORE),that employs graph attention networks to capture the dependencies between heterogeneous agents.To facilitate more accurate relational modeling among heterogeneous agents,hierarchical action space representation learning algorithm introduces a hierarchical variational autoencoder that maps the action spaces of all heterogeneous agents to a shared implicit action space.Finally,this paper also proposes a novel transfer learning framework that enables fast adaptation to multi-agent environments containing new agent types without retraining the entire SCORE algorithm while preserving existing policies.Numerical experiments on proof-of-concept tasks and precision agriculture tasks show that,compared to the baselines,hierarchical action space representation learning algorithm can model the dependencies of heterogeneous agents more accurately while offering significant transferability advantages in time-varying scenarios.To address the organization control problem in intentions understanding,the cooperative MARL algorithm,structured cooperation emergence algorithm(ROCHICO),proposed in this paper first learns adaptive grouping strategies through a reinforced organization control module based on independent MARL.Then,after forming several groups based on the reinforcement organization control module,structured cooperation emergence algorithm introduces a novel combination of self-supervised and unsupervised methods to decompose the joint intentions of all agents and learn the diverse group intentions corresponding to each group.After that,structured cooperation emergence algorithm models the individual intentions of the agent based on variational autoencoders and introduces a hierarchical intention module with consensus constraints to facilitate agents' understanding of individual intentions of intra-group agents and group intentions of extra-group agents.Finally,based on the multi-agent decision module,structured cooperation emergence algorithm can output the final multi-agent cooperation policy.Numerical experiments on four large-scale multi-agent cooperation tasks show that structured cooperation emergence algorithm can flexibly group agents according to other agents' behaviors and task completion.Further,structured cooperation emergence algorithm can perform accurate and effective individual intentions and group intentions understanding,significantly outperforming the baselines regarding exploration efficiency and cooperation intensity.In addition,communication among agents will be more beneficial for agents to understand higher-level objects such as the capabilities and intentions of others,so this paper uses centralized implicit communication as the cornerstone of behaviors,capabilities,and intentions understanding.However,a centralized implicit communication framework will be limited in real-world problems such as the curse of dimensionality,privacy protection,and single point of failure.To fully decentralize the training of algorithms and make the performance of decentralized algorithms as close as possible to that of centralized algorithms,this paper proposes a flexible and fully decentralized actor-critic cooperative MARL framework(F2A2).To address the gradient computation bias due to block coordinate gradient descent in previous works,the proposed framework uses an optimization algorithm framework based on primal-dual hybrid gradient descent.The proposed framework also reduces the communication load due to decentralization through a parameter sharing mechanism and a novel agents modeling method based on the theory of mind and online supervised learning.Numerical experiments in complex cooperative tasks show that the cooperation promotion MARL instantiation algorithm based on the proposed framework significantly outperforms the baselines with significantly less communication overhead and can approximate or even(in some cases)exceed the performance of the centralized algorithm.In summary,this paper aims to promote multi-agent systems to emerge cooperative behaviors more quickly and consistently by explicitly empowering agents to understand the behaviors,capabilities,and intentions of others based on implicit communication.For different levels of understanding,the typical problems in corresponding specific scenarios are solved in this paper through trust-region decomposition,action space representation learning,and structural cooperative emergence,respectively.In addition,to address many limitations of centralized implicit communication in real-world problems,this paper proposes a fully decentralized implicit communication framework,which enables cooperation promotion MARL algorithms,including but not limited to those proposed in this paper,to be more effectively trained and deployed in real-world scenarios.
Keywords/Search Tags:Multi-Agent Reinforcement Learning, Learning to Cooperate, Agents Modeling, Learning to Communication, Distributed Optimization
PDF Full Text Request
Related items