Font Size: a A A

Research On The Key Technology Of Multi-agent Collaborative Algorithm Based On Deep Reinforcement Learning

Posted on:2024-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:1528307301476724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning,owing to its capacity to address sequential decision problems without reliance on labeled data,has demonstrated its utility in diverse complex decision tasks since its inception and extension to practical scenarios such as game character and simulation training.In the real world,multi-agent systems with collaborative objectives are pervasive,encompassing Wi-Fi systems,traffic light control systems,routing management systems,autonomous driving systems,unmanned aerial vehicle systems,among others.Within these systems,individual controlled entities can be modeled using reinforcement learning,forming a set of multi-agent collaborative systems,and obtain the cooperative policies to achieve common goals by training the interaction of these controlled entities.In recent years,deep learning techniques,renowned for their potent function approximation and feature abstraction capabilities,have continuously progressed and achieved notable success across various artificial intelligence domains.Simultaneously,the extension of deep learning into reinforcement learning,has significantly propelled the advancement of both reinforcement learning and multi-agent reinforcement learning.This extension facilitates adept solutions to intricate collaborative problems within highdimensional spaces,emerging as a focal point in contemporary research.Nonetheless,prevailing multi-agent collaborative algorithms often grapple with low training efficiency and suboptimal sample utilization.These challenges stem from the inherent fitting errors of neural networks,the tenuous stability of cooperative policies,and the expansive space of the joint policy network,which impede further development and application.This dissertation squarely focuses on enhancing the training efficiency of multi-agent reinforcement learning algorithms.Grounded in specific application scenarios,such as multi-agent particle environments and Star Craft II game environments,it systematically addresses four pivotal issues contributing to inefficient training: inadequate utilization of environmental information,substantial temporal difference errors during model training,insufficient representation capabilities in value decomposition methods,and distribution shift in offline datasets.The primary innovations are delineated in the following:1.To address the issue of inadequate utilization of environmental information,this dissertation proposes a new multi-agent collaborative algorithm,AWGmix,based on Abbreviated Weighted Graph information-enhanced Mixing module.Since the agents are considered partially observable,which means that each agent has its corresponding field of view and can only observe information about other agents or the environment within its field of view.AWGmix first constructs a collaborative graph of agents based on their positional information.Leveraging the Floyd shortest path algorithm,it calculates the node hops between any two agents,establishing virtual connection relationships that extend beyond the immediate field of view.This process culminates in the creation of an enhanced collaborative graph model,with computed weights assigned to agent connection edges within this augmented graph.Concurrently,AWGmix introduces an attribution module designed to assimilate the action information of other agents,thereby empowering the currently controlled agent to make more informed decisions.Experimental results demonstrate that compared to mainstream multi-agent collaborative algorithms based on graph neural networks,AWGmix performs better in multi-agent particle environments and the Star Craft II game environment.2.In response to the challenge of training inefficiency arising from substantial temporal difference errors during model training,this dissertation introduces a novel acceleration algorithm termed RA3,founded on Anderson Acceleration with adaptive regularization.A comprehensive analysis is undertaken to discern the origins of temporal difference errors during updates in multi-agent collaborative algorithms.The update process of centralized value functions in collaborative algorithms is reabstracted as a fixed-point iteration process,and the Anderson acceleration algorithm is incorporated to calculate a more precise value function estimate to bolster the training efficiency of collaborative algorithms.Moreover,this dissertation scrutinizes potential numerical dispersion issues that may emerge during updates employing the Anderson acceleration,then proposes the adaptive coefficient calculation and algorithm restart mechanisms to stabilize the training process.Experimental results substantiate the efficacy of RA3 in significantly enhancing the training efficiency of existing collaborative algorithms.This contribution not only presents a practical solution to temporal difference errors but also introduces a novel perspective on the application of numerical computation methods in the realm of multi-agent reinforcement learning.3.To address the issue of insufficient representation capabilities in value decomposition methods,this dissertation introduces a novel algorithm called VDF,which means Value Decomposition Fusion in multi-agent reinforcement learning.VDF inherits the complete centralized value function representation capability from certain existing value decomposition methods while also achieving high training efficiency that satisfies the Individual-Global-Maximization principle due to the limitations imposed by neural network structures.On the other hand,VDF is designed to adaptively fuse various existing value decomposition policies without the need to create an entirely new neural network,and enhances the training efficiency of multi-agent collaborative policies.This dissertation explains the learning process and underlying mechanisms of VDF using a simple matrix game and demonstrates through more complex collaborative tasks that,even without designing complex information fusion networks,VDF significantly improves the performance of multi-agent collaboration based on value function decomposition.4.To address the challenge of training inefficiency caused by distribution shift in offline datasets in the context of multi-agent reinforcement learning,this dissertation proposes a Noise Injection based State enhancement algorithm,called NIS.NIS is aimed at enhancing state representations in a self-supervised manner.It starts by enhancing the global state information of the multi-agent system to locally expand the offline datasets.This enhancement enables offline algorithms to better handle out-of-distribution(OOD)data,reduce sensitivity to these unseen data,and obtain more accurate centralized value function estimates during training.Experimental results demonstrate that in the training process of multi-agent collaborative models based on offline datasets,NIS significantly improves the performance of both mainstream offline and online collaborative algorithms.
Keywords/Search Tags:Deep Reinforcement Learning, Multi-Agent System, Sequential Decision Making Process, Multi-Agent Cooperation
PDF Full Text Request
Related items