| With the development of artificial intelligence technology,the application of Unmanned Aerial Vehicle(UAV)in the military field is moving towards intelligence.UAV air combat based on artificial intelligence will become an important combat method.In the field of artificial intelligence,reinforcement learning can automatically learn to obtain autonomous action policy in unknown environments through interaction.This article takes reinforcement learning as a theoretical tool,and conducts research on autonomous maneuver decision-making problems in the three UAV air combat mission scenarios: target tracking,single-UAV air combat and multiUAV cooperative air combat.The main research contents and innovations of the paper are as follows:(1)Aiming at the problem of UAV path planning for target tracking tasks,an autonomous tracking path planning model based on Partially Observable Markov Decision Process(POMDP)is designed,and a model solving algorithm called finite action set is proposed.The algorithm realizes the unity of hyperopia strategy and online calculation through the selection of main belief points and the simplification of action values.For the path planning problem of tracking complex moving targets,it is proposed to use an Interacting Multiple Model-Unscented Kalman Filter(IMM-UKF)in the POMDP model to update the belief state of the path planning model.The simulation results show that,compared with the existing methods,the finite action set algorithm proposed in this paper is more suitable for online tracking motion decision under nonlinear observation conditions,and can effectively improve the tracking effect on complex moving targets.(2)Aiming at the problem of UAV maneuver decision in single air combat mission,a modeling method of air combat maneuver decision based on deep reinforcement learning is proposed.Firstly,the framework of air combat maneuver decision model based on reinforcement learning theory is established;then,the air combat situation model of 1v1 scene is established based on the second-order threedegree-of-freedom aircraft motion model;finally,aiming at the generalization of air combat maneuver policy in high-dimensional continuous state space and action space,air combat maneuver decision models under continuous state space-discrete action space and continuous state space-continuous action space are established respectively based on deep reinforcement learning.The simulation results show that the real-time of decision and robustness of policy made by the air combat maneuver decision model based on deep reinforcement learning are superior to that made by the traditional reinforcement learning method and the traversal optimization algorithm.(3)Aiming at the problem of training non-convergence caused by the sparse reward in the air combat maneuver decision model based on deep reinforcement learning,a step-by-step training method called basic-confrontation is proposed.The range of state transition is limited by the law of target movement,and the probability of positive reward value in local training is improved to avoid reinforcement learning into local optimal.In the policy learning of continuous action space,a training method of increasing prior samples in probability is proposed to solve the problem of large number of invalid samples and seriously sparse rewards caused by the traditional noise-based exploration method.The simulation results show that the proposed method can effectively improve the training efficiency of maneuver decision model based on deep reinforcement learning.(4)Aiming at the problem of multi-UAV cooperative air combat maneuver decision,a modeling method of air combat maneuver decision based on Multi-agent Deep Reinforcement Learning(MDRL)is proposed,and the policy coordination mechanism is designed.Firstly,based on the distributed multi-agent system architecture,a modeling method for multi-UAV air combat maneuver decision is proposed,and a model framework is designed;then,the communication network of UAV formation is designed based on Bi-directional Recurrent Neural Network(BRNN),which solves the problem of unstable reinforcement learning environment and establishes a distributed multi-UAV cooperative air combat maneuver decision model;finally,a policy coordination mechanism is designed based on target assignment and air combat situation assessment to solve the problem of maneuver policy coordination among UAVs.The simulation results show that the algorithm proposed in this paper can quickly generate cooperative air combat maneuver policy and has obvious advantages in the coordination and real-time decision of air combat policy compared with the traditional method that decomposes multi-UAV cooperative air combat into multiple 1V1 air combat.(5)A simulation confrontation system of UAV air combat maneuver decision based on reinforcement learning is designed,which further verifies the self-learning ability and effectiveness of the learned policy of the UAV air combat maneuver decision model established in this paper based on different simulation scenarios.The simulation confrontation system is a distributed system composed of the air combat environment simulation subsystem,the UAV self-learning subsystem and the manned aircraft operation simulation subsystem.In the simulation confrontation,the maneuver policy of the target is changed from algorithm logic to human control,and the simulation confrontation is upgraded from "mechine-mechine" confrontation to "man-mechine " confrontation.The simulation results show that the UAV air combat maneuver decision model based on reinforcement learning established in this paper can continuously update the maneuver policy through interactive learning to obtain the advantage situation and defeat the target in air combat. |