| Due to the explosive growth of global media equipment,spectrum resources are becoming more and more scarce.The emergence of cognitive radio technology(CR)greatly alleviates this problem.It transmits data dynamically on different spectra by perceiving the holes in the spectrum of the external environment.This method of perceiving the external environment can be used in wartime environments to counter enemy interference.Its key technologies include sensing technology and spectrum access technology.Sensing technology makes CR have the ability to perceive the external environment in real time,while spectrum access technology enables CR to have an intelligent brain,which can determine the channel that users choose for data transmission by analyzing the external spectrum environment.Because of the particularity of the wartime environment,the enemy’s interference mode can not be obtained,so the reinforcement learning(RL)can be used to learn the interference activity state through the continuous interaction between the user and the external environment,and to predict the spectrum state and combine the perception result to achieve the purpose of anti-interference.The main work of the thesis can be briefly summarized as the following two parts.First of all,aiming at the problem of cognitive anti-jamming based on single node,this paper deeply studies the widely used reinforcement learning Q-learning algorithm,then resets the reward function for the purpose of reducing channel hopping and increasing anti-jamming performance,and carries out reinforcement learning for both the perceptual phase and the transmission stage,in order to increase the learning speed of the algorithm.A cognitive anti-jamming algorithm based on single node is proposed by combining perceptual results with reinforcement learning algorithm.By improving the reward function,the algorithm reduces the jump of users between different channels so as to reduce unnecessary energy consumption,which is economical and environmentally friendly.In addition,this algorithm also reduces the probability of users being disturbed and improves the performance of the system.The two-stage reinforcement learning method increases the data transmission time of users,which is of great significance for the confrontation environment in which time is like gold.In this paper,simulation experiments are carried out to verify the performance of the system in terms of anti-interference ability and the ability to increase data transmission time.At the same time,the robustness of the algorithm is verified by applying different perceptual error rates to the system.Secondly,in order to solve the problem of interference between users if independent reinforcement learning is adopted in multi-agent environment,a sparse learning algorithm based on multi-agent reinforcement learning(MARL)is proposed in this paper.In the environment of the existence of multi-agent,this algorithm considers the anti-jamming performance of the whole system by means of cooperation,and improves the overall anti-jamming performance by sparse learning by considering the influence of neighbors’ selected actions on themselves.Finally,simulations are used to verify the performance advantages of the algorithm compared with independent learning and perception-based algorithms,and the impact on system performance in the presence of different perceptual error rates is also considered. |