Font Size: a A A

Multi-Agent Deep Reinforcement Learning For Sparse Interactions

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2428330647450743Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In large-scale multi-agent systems,the large number of agents and complex in-teraction relationship pose great challenges to the agents' policy learning.Therefore,the simplification of policy learning process is a very important research problem.At present,the traditional multi-agent deep reinforcement learning methods mainly focus on tightly coupled scenarios,that is,agents interact with each other tightly.However,in real-world,the interaction relationship in multi-agent system is often sparse,which means that agents do not need to interact at every step,nor do they need to interact with all other agents.The sparsity of agent interaction can greatly simplify the process of policy learning.Traditional methods use predefined methods or rules to define the state and relationship of interactions,but these methods are difficult to be applied directly to large-scale multi-agent systems.Therefore,we mainly focus on sparse interactions in large-scale multi-agent systems.In this thesis,we propose a knowledge transfer method based on the temporal sparsity and a game abstraction algorithm based on the spatial sparsity,respectively.Simultaneously,the above methods are applied to the electro-magnetic spectrum confrontation scenario,and a simulation system of electromagnetic spectrum confrontation based on deep reinforcement learning is designed to realize the combination of theory and practice.The main contributions can be summarized as fol-lows:1.Considering the time sparsity in multi-agent systems,we propose a novel markov decision process(MDP)similarity metric method to identify the interaction area in multi-agent systems.Based on this metric method,a novel policy transfer algorithm is proposed.Specifically,the traditional metric method based on Bisimulation is very complex to calculate.In this thesis,the concept of N-Step Return is proposed to represent the local dynamic characteristics of the environment,and two transfer methods are proposed based on the concept of N-Step Return.The direct value function transfer method reuses the value function of the source task directly in the target task.By measuring MDP,the N-Step Return-based transfer method can effectively avoid the negative transfer and further improve the learning performance of the algorithm.Finally,experiments in a variety of game scenarios show that the method based on N-Step Return not only obtains the optimal policy but also greatly improves the learning efficiency of the algorithm.2.Considering the spatial sparsity in multi-agent systems,a novel game abstraction method based on a two-stage attention mechanism is proposed,which combines hard-attention mechanism with the soft-attention mechanism.Specifically,based on the hard-attention mechanism,we can obtain which agents each agent needs to interact with,so as to directly reduce the unrelated agents from the game,simpli-fy the process of policy learning,and better apply to the large-scale multi-agent systems.At the same time,the learning process is further optimized by using the soft-attention mechanism to learn the corresponding relationship weights among a-gents.In addition,for each agent,the contribution information from other agents is obtained through the graph neural network,and two novel multi-agent policy learn-ing algorithms are proposed:the communication-based method GA-Comm and the cooperation-based method GA-AC.Finally,the effectiveness of the algorithm is verified in Traffic Junctions and Predator-Prey.3.Considering the sparse interactions in time and space,the above two algorithms are applied to large-scale electromagnetic spectrum confrontation scenarios,and a sim-ulation system of electromagnetic spectrum confrontation based on reinforcement learning is designed and implemented.In this way,the learning cost and research cost of system development using reinforcement learning can be reduced,and the implementation of artificial intelligence can be further accelerated.The final poli-cy is evaluated,explained,and optimized through the function of task duplexing to meet the special requirements of this scenario.
Keywords/Search Tags:Multi-Agent System, Deep Learning, Reinforcement Learning, S-parse Interactions
PDF Full Text Request
Related items