| The autonomy and intelligence of air combat maneuver decision-making is the key to realize the application of UAV in real air-to-air battlefield.Under the third wave of artificial intelligence represented by deep reinforcement learning technology,it is of great significance to study UAV air combat maneuver decision-making based on new technology.By investigating the research status of artificial intelligence air combat background in the United States,China and other major countries in the world,this paper sorts out the three types of current air combat maneuver decision-making methods:mathematical methods based on game theory,rule methods based on machine search and learning methods based on data-driven,summarizes the historical origin,typical applications and algorithm progress of deep reinforcement learning technology,and analyzes the shortcomings of the above related research,Finally,the core research content of this paper is determined-UAV air combat maneuver decision-making method based on opponent action prediction.In this context,the paper mainly completes the following work:(1)Environment modeling of air combat within visual range and implementation of air combat learning and training simulation environment.By describing the process of modern air combat and analyzing the necessity of research of air combat within visual range,the research scenario of this paper is determined;Then,based on the Markov decision process,it is formalized into the basic elements of reinforcement learning task:the action space dominated by seven basic maneuvers is established according to the six degree of freedom motion model of UAV,the state space dominated by deviation angle,relative distance and height difference is established according to the spatial relative position relationship between enemy and our UAV,and the final environment reward of air combat task is designed according to the judgment conditions for the end of air combat confrontation,According to the real-time situation advantage of air combat confrontation,the UAV is rewarded in real time,and the greedy optimal strategy of air combat maneuver decision is designed for the enemy UAV;Finally,the above elements are integrated to realize the reinforcement learning and training simulation environment of air combat within visual range,which is used for the learning and training of our UAV.(2)Research on game confrontation algorithm based on opponent action prediction.By analyzing the inherent "non-stationarity" problem in game confrontation task,a general implicit opponent modeling method based on D3 QN algorithm and opponent action prediction is proposed.The hidden layer features of opponent action are embedded into the Q-value learning process of D3 QN to alleviate the influence of opponent on learning stability,which is recorded as D3QN-OAP algorithm.Firstly,the network structure of the algorithm is designed,which is divided into three modules: environmental state feature extraction,opponent action prediction and Q-value learning;Then,the learning method of the algorithm is designed,that is,the loss function of the opponent action prediction and Q-value learning module,and the learning adaptive adjustment mechanism is designed to balance the learning focus of the agent in different training stages;Finally,the performance of the algorithm is verified by simulation in football environment.(3)1v1 close air combat simulation verification experiment.In the 1v1 air combat scenario,an air combat maneuver decision-making model based on D3QN-OAP algorithm is established for the red UAV.The blue UAV with greedy optimal strategy is equipped as the opponent.Under the initial adverse situation of the red side,the agent decision-making model of the red UAV is trained.Through the simulation experiment,the effectiveness of enemy aircraft action prediction is verified first,and then the learning process performance and learning result confrontation level of D3QN-OAP decisionmaking model are compared and analyzed through algorithm comparison experiment and algorithm ablation experiment respectively.Finally,the three-dimensional confrontation trajectory of D3QN-OAP decision-making model is analyzed,which shows that it has learned a certain maneuver decision-making ability and has the maneuver decisionmaking level to reverse the war situation under the initial adverse situation. |