With the continuous development of technology,the battlefield situation is becoming more and more complex,and there may be mistakes in using only manual decision-making.Using reinforcement learning to assist in the decision-making of collaborative operations command in complex environments is a current research direction to solve this problem.Aiming at the large-scale reconnaissance decision-making and long-distance cooperative attack assistance decision-making of multi-agent system in large scenes,this paper proposes a training algorithm based on reinforcement learning,designs the corresponding reward value function,and validates it through simulation experiments.This paper mainly completes the following work:1.This part addresses the problem of reinforcement learning formation reconnaissance in large scenes,and a PPO-QMIX multi-agent formation reconnaissance algorithm is designed.The behavior of the multi-agent system is divided into two parts: formation adjustment and synchronous movement reconnaissance.The multi-agent reinforcement learning algorithm QMIX is designed to adjust the formation,and the reinforcement learning algorithm PPO is designed to realize the marching reconnaissance.The effectiveness of the algorithm is verified by several sets of simulation experiments in different scenarios.2.This part addresses the problem of multi-target and long-distance reinforcement learning cooperative attack assistance,and a PPO-QMIX intelligent attack assistance decision-making algorithm is designed.The behavior of the multi-agent system is divided into two parts: the overall path planning and the cooperative operation.The PPO algorithm is designed to realize the shortest path planning decision,and the QMIX algorithm is designed to complete the operational optimal decision.In addition,the state boundary of the multi-agent system of the two-part behavior is set,and the switching conditions of the decision algorithm are given.The effectiveness of the algorithm is verified by several sets of simulation experiments in different scenarios.3.This part addresses the problem of multi-agent reconnaissance and attack assistance decision problem in large scenes,and a two-stage training framework is designed based on the algorithms in the first two chapters.In the first stage,the multi-agent formation investigation reinforcement learning algorithm is used to train a network model for completing the investigation task,and the investigation model is used for investigation in the second stage,Then,according to the detected information,the subsequent cooperative operation model simulation is completed,and a cooperative operation model is trained by using multi-agent cooperative operation reinforcement learning algorithm.The effectiveness of the algorithm is verified by several sets of simulation experiments in different scenarios. |