Research on multi-agent systems has always been an important topic in the field of artificial intelligence.Multi-agent confrontation strategy is one of the key technologies in multiagent systems,and it has a wide range of applications in robot-assisted decision-making,distributed control,and intelligent military deductions.In the existing research work,the use of reinforcement learning algorithms can allow a single agent to learn a better policy in a simple environment.However,in a multi-agent environment,due to the complexity and uncertainty of the environment,there will be problems such as the inability of efficient communication between agents,the sparse reward information,the instability of the algorithm and the inability to converge during the learning process.In view of the above problems,this paper studies the sparse problem of agent information exchange and environmental reward in multi-agent adversarial decision-making.The main tasks are as follows:Aiming at the problem that efficient information exchange cannot be achieved between agents in a multi-agent confrontation environment,an information exchange mechanism based on graph neural network and hierarchical attention is designed.First,the multi-agent system is modeled as a graph network structure,and then the hierarchical attention method is used to extract the information between multiple agents,integrate the attention weights within the agent group and the attention weights between groups,and finally combine the Actor-Critic The framework embeds the communication mechanism into the Critic network to output the Q value to guide the agent to make decisions.For each agent,the hierarchical attention mechanism is used to help the agent obtain information about which agents to interact with,and the graph neural network is used to obtain the contribution from other agents,so as to achieve efficient communication between agents.Aiming at the problem of sparse rewards in a multi-agent confrontation environment,a priority experience replay and an exploration method based on parameter space noise are designed.The priority-based experience playback uses TD-error as the standard to measure the priority.By introducing the experience sampling probability and importance sampling,the samples in the experience pool are prioritized,so that the samples selected during sampling have a high impact on decision-making.purpose of value.In order to enable the agent to explore better,parameter space noise is introduced to increase the exploration depth of the agent and provide the possibility for the agent to explore different strategies.The proposed algorithm is experimentally validated in a multi-agent hybrid adversarial environment and a cooperative environment.In the problem of agent communication,the algorithm is analyzed through the influence of round reward,hierarchical attention,and scalability.The experimental results show that the communication mechanism proposed in this paper can make the agent selectively pay attention to other agents,which greatly promotes efficient cooperation and communication between agents.On the problem of environmental reward sparseness,by comparing with random sampling and methods based on noiseless exploration and action space exploration,the experimental results show that the algorithm proposed in this paper has higher round reward,faster convergence rate and better stability It can help the agent to better explore in the environment,and effectively solve the problem of sparse rewards in the process of multi-agent reinforcement learning. |