| Based on the background that future combat will be changed from information-oriented to intelligence-oriented,this thesis is dedicated to research the missile attack-defense confrontation problem from the respect of multi-agents competition by treating missiles as agents,and provide a new approach to achieve intelligent autonomous flight,penetration and precise strike of aircrafts.Firstly,the source,purpose and significance of this research are briefed.The overseas research present situation and disadvantage of the main research method of aircrafts attack-defense confrontation are summarized.The development trend and applications in game confrontation of reinforcement learning and deep reinforcement learning are also discussed and the feasibility of researching aircrafts attack-defense confrontation by using reinforcement learning method is analyzed.Secondly,the models of attack-defense confrontation are established.The mathematical model of kinematics and dynamics and the environment model of attack-defense confrontation are established separately for the two objects,intelligent car and aircraft.For the intelligent car,the classical pursuit-evasion game,penetration of one versus one and many versus many in guarding a territory are build.For the aircraft,the construction of the penetration model when one attacker and one interceptor flight outside the atmosphere is introduced.Thirdly,reinforcement learning algorithms applied to attack-defense confrontation are studied.First,the theoretical basis of reinforcement learning in continuous system is briefed.The principle of the method of using fuzzy inference system to discretize continuous state space and action space,and using multi-hidden layer neural network and deep recurrent network to fit the function of input(continuous states)to the output(continuous actions)are introduced.Second,combined the fuzzy inference system and classical Q learning,the interception algorithm based on FQL for intelligent car is proposed.Then,used the multi-hidden layer network,the controlling strategy based on DDPG is studied.Furthermore,used the multi-hidden layer network and deep recurrent network,the many versus many penetration algorithm based on MADDPG is proposed.For the missile,combined the feature of maneuver and controlled variables,the warhead penetrating algorithms based on DQN and DDPG are proposed.Fourthly,experiment and simulation platforms are constructed.For the intelligent car,the overall structure of the experiment system,as well as the implementation plan of the intelligent car subsystem,the indoor positioning subsystem,the controlling subsystem and the wireless communication subsystem are designed.For the missile,the logical structure of the 3-DOF simulation platform of missile attack-defense confrontation,as well as the specific implementation plan of bottom driver module and AI interface module are designed.Finally,simulations and experiments are carried on to verify algorithms.The simulation results of the attack-defense confrontation algorithms for intelligent car and experiment results of pursuit-evasion game are displayed to show the scalability and adaptability.The simulation results of aircraft penetrating algorithms based on DQN and DDPG are displayed to show the feasibility and flexibility. |