| Differential games with conflict and antagonism play important roles in the military field.Reinforcement learning has been widely concerned in the field of complex nonlinear systems and multi-agent because of its good learning performance.In this thesis,two reinforcement learning algorithms,Minimax Q-learning and fuzzy Q-learning,are used to solve a typical kind of differential game problem,aircraft pursuit-evasion game.Firstly,the difficulties of solving differential games and the problems faced by reinforcement learning are introduced,and the theories and main algorithms of reinforcement learning are explained.This thesis expounds the theory of differential games,establishes the differential game model of pursuit-evasion problem,describes the system by relative motion state of aircrafts,simplifies the state equations of pursuit-evasion model,and analyzes the symmetric relationship between the states of the system and the control quantities of both sides.Then,Minimax Q-learning is used to solve the control policies of pursuer and evader.The pursuit-evasion game is transformed into a zero-sum game,the reinforcement learning model is established based on simplified state equations.Under the condition that the pursuer knows the action of the evader at the moment,the symmetric relationship between the system states and the control quantities of both sides is used to improve the learning efficiency of Q-value,and the off-line Q-matrix obtained by Minimax Q-learning algorithm is used to be a guidance for both sides to choose their policy.Simulation results verify the feasibility of the method.Finally,the non-zero-sum fuzzy Q-learning model is established for each agent and the optimal control quantities of both sides are calculated.Fuzzy Q-learning can generate global continuous actions for agents in continuous time system to overcome the discontinuous of the control quantities in Minimax Q-learning.In practice,the actions of others cannot be observed at the moment.Therefore,the fuzzy Q-learning model of both sides is established and solved in this condition,and the control policies of both parties are calculated through off-line Q matrices.The simulation results show the effectiveness of the method,and the comparison with Minimax Q-learning shows the practicability of fuzzy Q-learning in continuous time system. |