Font Size: a A A

Multi-agent Confrontation Algorithm Based On Reinforcement Learning

Posted on:2022-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:S N HouFull Text:PDF
GTID:2518306743951469Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Multi-agent system(MAS)refers to a computerized system composed of multiple agents that can interact with the environment.Because deep reinforcement learning has strong exploration and decision-making capabilities,deep reinforcement learning technology has become the mainstream method for intelligent decision-making in multi-agent system.With the continuous development of artificial intelligence technology,multi-agent reinforcement learning has been widely used,and the problem of collaborative confrontation has strong research value.The deep reinforcement learning research on the problem of multi-agent collaborative confrontation aims to obtain the optimal strategy to achieve the goal through the interaction between the agent formation and the environment.The deduction of the multi-agent collaborative confrontation environment is affected by the execution of all agents' actions.Due to the large number of agents and the existence of agents that are not controlled by one's own side,the environment is complex,dynamic and unstable.And because the complexity of the multi-agent system increases with the increase of the number of agents,a huge exploration space will be generated,and the strategy is dynamically changed based on it,which makes the experience playback sample inefficient.The above problems have seriously affected the performance of deep reinforcement learning algorithms on MAS.This paper reviews the historical development of multi-agent reinforcement learning,and combines existing work to conduct research.The main research content of this paper includes the following two parts:(1)For complex dynamic and unstable environmental problems,an unknown agent is proposed.Multi-agent collaborative confrontation algorithm for behavior prediction.The main body of the algorithm adopts a value decomposition network structure,combines supervised learning and reinforcement learning,and innovatively adds an unknown agent behavior prediction module.The unknown agent behavior prediction module builds and trains a supervisory auxiliary model based on the historical characteristics and execution actions of the unknown agent to predict the actions of the unknown agent.The value decomposition network merges the output of the prediction module with environmental state information to make intelligent decisions.Experiments show that the algorithm performs better than the current mainstream baseline algorithm in the SMAC Star Craft II environment and the Ma CA formation confrontation environment.(2)Aiming at the low efficiency of multi-agent experience playback samples,a robust multi-agent reinforcement learning experience playback multi-layer construction method is proposed.This method has a three-level structure for experience playback.First,the storage method of the experience playback buffer pool is improved by the reservoir algorithm,and then the sample set that is conducive to encouraging exploration is screened out by the similarity measurement screening method,and finally performed on the basis of this set.Importance sampling based on policy changes improves the stability and credibility of the sample.Experiments prove that the method has good performance in both SMAC environment and MaCA environment.
Keywords/Search Tags:reinforcement learning, multi-agent system, behavior prediction, experience replay, importance sampling
PDF Full Text Request
Related items