| In modern warfare,with the continuous development of electronic information and artificial intelligence technology,radar countermeasures gradually tend to be intelligent and multifunctional,radar working modes are gradually increasing,and anti-jamming capabilities are gradually enhanced.Traditional jamming decision-making methods have been difficult to adapt to the increasingly complex modern battlefield environment due to their long decision-making process and low jamming efficiency.Aiming at this problem,in order to improve the performance of jamming decision-making in the radar countermeasure process,this paper studies an intelligent jamming decision-making method based on reinforcement learning,which can be applied to the decision-making of the jamming system on the jamming pattern and jamming parameters in the radar countermeasure process.The simulation experiment shows that the research method in this paper has better jamming performance for the jamming of multifunctional radar.The main research work of the thesis is as follows:Firstly,this paper analyzes the process of radar countermeasures,compares the advantages and disadvantages of traditional radar countermeasures and intelligent radar countermeasures,studies related reinforcement learning algorithms,and conducts related simulation experiments.On this basis,the jamming decision model based on Q-learning is established by refering to the jamming process of intelligent radar,and the rationality of intelligent interference decision-making based on the Q-learning algorithm,the influence of algorithm parameters,state transition probability and prior knowledge on the performance of Q-learning algorithm is verified and analyzed through simulation experiments.Secondly,aiming at the decision-making problem of jamming parameters,a multi-armed bandit jamming model is established,and the Jamming Bandit(JB)algorithm is introduced.Aiming at the problems of low average reward and low accuracy caused by its discretization process,this paper adopts the method of stepwise discretization and weighted estimation of jamming parameter reward to improve JB algorithm.The Jamming Band based on Stepwise Discretization(JBSD)algorithm is proposed.The theoretical analysis and numerical simulation experiment of the cross-eye jamming show that compared with JB algorithm,the JBSD algorithm achieves the higher average reward and the faster convergence speed.Finally,in view of the actual situation of the jamming system’s decision-making requirements for jamming patterns and jamming parameters,the intelligent jamming decision-making process based on reinforcement learning is divided into two steps.Combining the Q-learning algorithm and the JBSD algorithm,the jamming decision-making method based on two-layer reinforcement learning is proposed.The influence of environmental factors such as the probability of radar state transition and the difference between the observed reward and the actual reward on the decision-making performance of the method is analyzed.The experimental results show that the jamming decision-making method based on two-layer reinforcement learning is easily affected by the probability of radar state transition,but it has better anti-jamming ability to the difference between the observed reward and the actual reward. |