With the development of wireless communication technology,the channel environment becomes more and more complex.Due to its broadcasting nature,wireless communication is very vulnerable to malicious jammers.For unknown and adversarial spectral environment,combined with cognitive radio and machine learning technology,the way to combat interference is becoming more and more flexible.Reinforcement learning is widely used in the field of radio anti-jamming because it does not need a model and learns the optimal strategy in the process of continuous interaction between agent and environment.First,for multi-user anti-interference scenarios in wireless networks with central service base station,this thesis models it as a multi-agent random game process,and proposes a multi-user anti-interference algorithm based on MF Deep SM2 on the basis of Mean Field Multi-agent Reinforcement Learning(MFMARL).The central service base station learns the spectrum information observed by the cognitive user and the reward information according to the algorithm,finds the Nash equilibrium strategy of each cognitive user and then distributes it,which can effectively avoid interference and reduce user conflicts.The simulation results show that in in the presence of four cognitive users in sweep jamming scenarios,the anti-jamming performance of MF Deep SM2 is about 68.2% higher than that of Independent Q-learning,and 16.2% and 9.0% higher than that of MF Q-learning and MF Deep Mellow algorithms,respectively.Then,for scenarios where there is no central control and online communication in the wireless network environment,in order to solve the problems of conflicts and interference among multi-user,Markov framework is used for modeling,and Multi-agent Joint Anti-jamming Decision Algorithm(MJADA)is proposed.The algorithm combines Long Short Term Memory(LSTM)and Deep Q Network(DQN),which not only achieves effective convergence in multi-user scenarios with huge action state space,but also has a good anti-interference effect in different interference scenarios.The simulation results show that in the presence of two cognitive users in the sweep jamming scenario,the anti-jamming performance of the MJADA algorithm is about 72.3% higher than that of the random strategy,and33.7% higher than that of the independent DQN algorithm. |