Font Size: a A A

Research On Key Technologies Of Reinforcement Learning For Cooperative Multi-Agent System

Posted on:2024-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:1528306932458854Subject:Information and Communication Engineering
Abstract/Summary:
Cooperative Multi-Agent System(CMAS)refers to a system where multiple agents collaborate towards achieving a shared objective.The development of deep reinforcement learning has facilitated notable advancements in CMAS,leading to the establishment of Cooperative Multi-Agent Reinforcement Learning(CMARL)as a distinct research domain.In CMARL,the challenges lie in attaining precise cooperation and efficient information sharing among agents,while striking a balance between individual rewards and collective team rewards.Currently,CMARL algorithms can be broadly categorized into three paradigms to solve these problems:centralized training and centralized execution,decentralized training and decentralized execution,and centralized training and decentralized execution.Each paradigm has its own advantages and disadvantages,rendering them suitable for diverse practical scenarios.Despite the promising applications of CMARL,several hurdles remain in real-world implementations,such as constructing representative state representations,formalizing problem descriptions,improving training efficiency and deploying agents to practical scenarios.Further research and experiments are imperative to cater to the varying demands of different application contexts.This topic is based on cooperative multi-agent systems and conducts research of reinforcement learning key techniques.The main research content and innovative points included in the thesis are as follows:·A mirror loss function for reinforcement learning is proposed.State representation is an important factor that affects the training efficiency of reinforcement learning algorithms.Traditional deep reinforcement learning algorithms often fail to capture the complete information contained within the state space,limiting the intelligence of the trained agents.For example,consider the Atari environment where the game interface serves as state input,an agent trained with horizontally flipped interface would struggle to discern appropriate actions.In our work,we propose a mirror loss function that enforces consistency between an agent’s actions in the mirrored environment and its actions in the original environment.By incorporating this loss function,the agent can swiftly perceive the valid information in the state space,gaining a deeper understanding of the environment’s underlying dynamics instead of merely overfitting to the entire game environment.Consequently,this approach leads to enhanced final performance and improved learning outcomes.·A distributional cooperative multi-agent reinforcement learning training method is proposed.Existing MARL algorithms typically treat individual state-action values and global state-action values as fixed quantities.However,considering the inherent randomness of the environment,it is more appropriate to model stateaction values as probability distributions.This work proposes to utilize cardinality and its associated probability to describe the distribution of state-action values.To fit the joint state-action value distribution using value decomposition methods,we employ a distribution mixture network that combines individual state-action value distributions.Importantly,these operations must adhere to the constraints of individual-global maximization.We introduce five fundamental operations on distributions:weighting,biasing,convolution,projection and function transformation.Under certain conditions,these operations ensure compliance with the individual-global maximization constraint.Leveraging these operations,the distributional mixture neural network transforms value decomposition-based cooperative MARL algorithms into a distributional form.Compared with nondistribution algorithms,these distribution-based approaches demonstrate superior performance across a variety of environments.·A novel cooperative multi-agent reinforcement learning training framework is proposed.Current approaches often rely on either value decomposition or centralized actor-critic frameworks,both of which are specifically designed for the task of decentralized execution in centralized training.In contrast,our proposed framework introduces a novel approach with a centralized teacher and decentralized students.The teacher model is trained using a centralized algorithm,while the student modules leverage their own observations to distill knowledge from the teacher’s state-action pairs.By adopting this training method,we ensure that all centralized training algorithms can meet the requirements of decentralized execution tasks through knowledge distillation.Moreover,our framework demonstrates superior experimental performance compared to the customized frameworks designed for specific tasks.·The problem definition of a cooperative multi-agent reinforcement learning system with unexpected crashes and a coach-assisted MARL method is proposed.Traditional experiments in cooperative MARL primarily take place in simulated environments,where agents can operate without any disruptions.However,in real-world applications,software or hardware failures often lead to agent crashes,impeding the seamless collaboration among agents during training and resulting in a notable performance decline.To address this challenge,we propose to add an additional coach agent to the multi-agent system.The coach agent dynamically adjusts the agent dropout rate based on the current performance of the system.This enables the agents to adapt and learn how to cooperate effectively in the presence of unexpected crashes.Through extensive experimentation in various environments,we demonstrate the efficacy of our approach in achieving stable performance under different crash rates.
Keywords/Search Tags:Multi-Agent System, Reinforcement Learning, Cooperation, State Representation, Knowledge Distillation, Distributional Reinforcement Learning, Fault Tolerance
Related items