With the continuous development of wireless communication systems,the explosive growth of data traffic and the shortage of wireless spectrum resources have brought great challenges to mobile cellular networks.Device-to-device(D2D)communication is defined as a technology that under the control of the base station,mobile terminals close to each other in the cell can communicate directly without forwarding through the base station or core network,which can effectively improve the spectrum utilization of the system and alleviate the pressure of spectrum scarcity.However,as we become increasingly dependent on wireless services,security threats to the integrity and availability of wireless communications have become a major problem.In order to maximize the advantages of D2D communication,interference control has become a very important issue.For interference control problems,machine learning technology,especially reinforcement learning,does not need to use pre-calibrated data sets in the training process,and can explore solutions independently,so it has been widely used in the field of wireless communication in recent years.At the same time,deep learning can extract advanced features from the original data and shine in the combination with reinforcement learning.In this paper,we study the anti-jamming defense problem in the multiuser scenario of D2D communication.Considering that D2D users and traditional cellular users share the uplink spectrum resources,there is cochannel interference between D2D users and cellular users in the system.At the same time,due to the vulnerability of wireless communication,when there is bad signal interference outside the system,the performance of D2D communication without anti-jamming mechanism will be negatively affected.Aiming at this problem,this paper proposes an antijamming algorithm based on deep reinforcement learning algorithm.The specific research contributions are as follows:(1)Firstly,a centralized power control scheme based on base station is proposed by using power control method.In the face of sweeping interference,the physical concepts such as base station and D2D user equipment,throughput,and transmission power in the cellular network are combined with the concepts of agents,rewards,and actions in reinforcement learning.By introducing the maximum entropy idea and designing a reward value function based on conditional throughput,an anti-jamming algorithm based on Soft Actor-Critic(SAC)is proposed and a throughput change index of D2D users in an interference environment is designed to evaluate the impact of interference signals on communication conditions.The simulation results show that compared with the traditional deep reinforcement learning algorithm Deep Q-Network(DQN)and random method,it can obtain higher system throughput and faster convergence speed,and can show better anti-jamming effect and increase channel reliability.(2)By using the distributed Actor-Critic framework,the system is modeled again,and a distributed power control algorithm with each D2D user as the core is proposed.The reliable channel ratio in the cellular network where the D2D user is located in the interference environment is designed as the anti-jamming performance index.The scheme is based on the multi-agent soft actor-critic(MASAC)algorithm of the maximum entropy strategy.Compared with the centralized control scheme,the problem is extended to a distributed one,and better anti-jamming performance is obtained.(3)Aiming at the performance difference between the centralized power control anti-jamming algorithm and the distributed power control algorithm in(2),the reward function in the forced learning algorithm is redesigned to obtain better convergence and anti-jamming performance.(4)Based on the distributed power control scheme with each D2D user as the core,aiming at the problem that the interference control method is relatively single,the method of joint frequency band allocation and power control is used to avoid the influence of interference.By designing the reward function of each agent,an anti-jamming solution based on distributed MASAC algorithm for joint power control and spectrum allocation is proposed.The simulation results show that compared with the schemes in(1)and(2),this method can effectively improve the channel reliability and system throughput,and increase the spectrum utilization.It also verifies the advantages of the maximum entropy algorithm of the uncertain strategy for the anti-jamming problem modeled as deep reinforcement learning. |