Font Size: a A A

Research On Reinforcement Learning Based Communication Jamming Strategy Learning Methods

Posted on:2020-07-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:S S S ZhuanFull Text:PDF
GTID:1368330611492995Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the increasing position of electronic warfare(EW)in modern warfare,taking control of the electromagnetic spectrum has become a key means of winning the battlefield.Due to the complexity of the jamming environment and the use of various anti-jamming methods and artificial intelligence(AI)technologies,it greatly increases the difficulty of successful jamming.Fortunately,the concept of cognitive jamming has narrowed the gap in the game ability between the two confrontation sides.In particular,reinforcement learning(RL)theory is applied to communication jamming strategy learning,which makes jamming equipment adjust jamming strategy constantly in the process of interaction with the environment,overcomes the influence of unknown factors on learning,and finally achieves the learning of the optimal jamming strategy.However,there are still some problems in the current research on jamming strategy learning,which are highlighted by the excessive number of interactions and the limitations of application scenarios.This dissertation studies the learning methods of jamming strategy for different jamming scenarios.The main contents are as follows:(1)The complex and unknown electromagnetic environment leads to different degrees of distortion of the constellation of the target signal,and the classic optimal jamming strategy is often not optimal.In order to learn the optimal jamming scheme of distorted signals,a new method of constructing jamming scheme for high order modulated(HOM)signals is proposed,and then different kinds of jamming schemes are constructed by orthogonal decomposition(OD).The existing RL algorithms have many trial-and-error times and slow convergence speed when learning the optimal jamming strategy.In this paper,the timeliness of optimal jamming strategy learning is improved from the perspectives of search and prediction.In the research of searching strategies:(1)Using the correlation characteristics between discrete divided jamming actions,a jamming strategy learning algorithm based on positive reinforcement learning(PRL)is proposed.This algorithm reduces the interactions required in the learning process by improving the probability of the optimal strategy being selected.(2)To reduce the randomness of strategy selection in PRL,a dual reinforcement learning(DRL)algorithm is proposed,which reduces the search scope of the optimal strategy by adding constraints,and further reduces the interactions required in the learning process.(3)To reduce the randomness of search direction in DRL algorithm,a local search(LS)algorithm is proposed.This algorithm reduces the interactions required by learning by gradually approaching the optimal strategy.In addition,it has the ability to jam while learning.Simulation experiments show that interactions required by the above three search-based algorithms are gradually reduced,and the jamming efficiency in the learning process is gradually improved.In the research of predicting strategies:(1)Using the monotonically increasing characteristic of the curve of the jamming strategy value function,a jamming strategy learning algorithm based on monotonic cubic spline interpolation is proposed.This algorithm predicts the value function(PVF)curves by non-uniform interpolation,and then determines the optimal jamming strategy according to the prediction results.(2)To overcome the difficulty of selecting interpolation points,a jamming strategy learning algorithm based on value function matching(MVF)is proposed.This algorithm uses the characteristics of noise distribution to construct the curve library of value function beforehand,and then uses a small number of sample points combined with orthogonal matching pursuit(OMP)method to predict the true curve.(3)Combining the local optimization ability of the search method and the global planning ability of the prediction method,a jamming strategy learning algorithm based on local search and predicts value function is proposed.Simulation experiments show that the interactions required by the above three prediction-based strategy learning algorithms continue to decrease,which enhances the practicability of RL in jamming strategy learning.(2)After being jammed,in order to restore normal communication,the enemy will reduce the impact of jamming by increasing the signal power,switching the transmission channels,changing the modulation patterns and other methods.At this time,it is necessary to learn the mapping relationship between the environment state and the jamming action to maximize the cumulative reward in the jamming process.(1)After modeling the jamming problem as the Markov decision process(MDP),a jamming strategy learning algorithm based on apprenticehsip learning is proposed.This algorithm takes the jamming experience as an expert strategy,constructs reward function with state features,and obtains new jamming strategies by learning feature weights.The number of interactions required for convergence is much less than that of Q-learning algorithm.(2)When the enemy uses cognitive radio(CR)technology to dynamically select the access channels,an algorithm of learning cognitive radio jamming strategy based on apprenticeship learning is proposed.In this algorithm,the history record of the cognitive user's channel selection is employed as an expert strategy,and the channel selection strategy of the enemy is predicted by the state value function which is represented by 8 proposed features.Simulation experiments show that the proposed algorithm can achieve better jamming effects.(3)After being continued jamming,the enemy will also change the route of the network to avoid jamming.At this time,only jamming with a single node in the network is not enough to achieve the purpose of communication rejection.(1)In order to jamming with the target network,a multi-nodes jamming strategy learning algorithm based on improved combinatorial upper confidence bound(CUCB)algorithm is proposed.This algorithm uses a reasonable credit-assignment method,updates the reward information of nodes by UCB algorithm,and maximizes the jamming effect by jamming with nodes with higher reward values.(2)To further improve the effect of network jamming,a learning algorithm of multi-nodes jamming strategy based on the correlation of nodes is proposed.The algorithm guides the selection of jamming nodes by constructing the correlation matrix of nodes,and updates the matrix by using the reward of interaction.Simulation experiments show that the two proposed multi-nodes jamming strategy learning algorithms have better jamming effects and environment robustness,and the validity of the new network layer reward standard proposed in this paper is verified.(4)Due to the use of adaptive zero-adjusting antenna to offset jamming for specific communication targets of enemy,a single jammer can not achieve effective jamming,and a cooperative jamming strategy learning method for multiple jammers is needed.(1)When there is a control center between jammers,the center uses existing search or prediction methods to learn the jamming strategy and assigns jamming tasks to the controlled jammers;(2)When multiple jammers are connected by networking,a multi-jammer cooperative jamming strategy based on the convention is proposed.Under the constraints of the convention,with the progress of interaction,the tasks of each jammer are gradually clear.(3)When multiple jammers are unable to communicate due to jamming,a multi-jammer cooperative jamming strategy based on self-confidence is proposed.The jammers update their self-confidence values according to their respective jamming actions and environmental feedback,and use these values to guide subsequent jamming actions selection.Simulation experiments show that after a small number of interactions,the three cooperative jamming strategies mentioned above can effectively jam the target signal,and have a high utilization rate of jammers.In this paper,the problem of applying the RL theory to the jamming strategy learning under different jamming tasks is studied,and some research results are obtained,which provides reference value for the further study of cognitive jamming in the future.
Keywords/Search Tags:cognitive jamming, reinforcement learning, local search, Markov decision process, apprenticeship learning, cognitive radio, wireless ad hoc network, multi-agent system
PDF Full Text Request
Related items