| With the rapid development of artificial intelligence(AI)and UAV swarm technology,multi-UAV confrontation has become a hot pot in the military field.MultiUAV confrontation is an extension of UAV swarm technology,which controls the UAV swarm to fight in the air through intelligent algorithms.Based on the multiagent deep reinforcement learning algorithm,this paper uses a variety of techniques to solve problems in multi-UAV confrontation tasks,and builds a reinforcement learning model for multi-UAV confrontation.UAVs are trained by reinforcement learning methods and are tested in the simulation environment of multi-UAV confrontation.The main contributions of this paper are as follows:1.We formally describe the multi-UAV confrontation task and design the motion model of UAV.We build a simulation platform for multi-UAV confrontation and set a target allocation mechanism for UAVs.Besides,we design algorithms to control opponents’ UAVs and our UAVs.Our UAVs adopt the MADDPG algorithm,and opponents’ UAVs adopt a rule-based method.Finally,we use the MADDPG algorithm to train our UAVs,so that our UAVs can effectively intercept the opponents.2.In order to solve the non-stationary problem caused by the changes in opponents’ policies,we improve the Actor-Critic framework based on the MADDPG algorithm.And we propose an additional opponent characteristics method for multi-UAV confrontation,which introduces additional opponent characteristics to model opponents’ policies and indirectly predict opponents’ behavior.Our UAVs trained by additional opponent characteristics method can predict the changes in opponents’ policies,so that our UAVs can make decisions in advance,which reduces the fluctuation in policy learning,making the reinforcement learning process more stable.3.In multi-UAV confrontation,the data input dimension of the centralized Q network is large,which is not conducive to the learning of UAV cooperative policies and makes reinforcement learning to fall into the local optimum.We propose a group-based Actor-Critic method for multi-UAV confrontation,which dynamically groups our UAVs and reduces the data input dimension of the network.Besides,we introduce a double Q network to model the cooperative policies of our UAVs,so that our UAVs can quickly converge to the optimal cooperative policies and learn more advanced group cooperative behavior during the confrontation. |