As the sixth-generation communication system arises from the conception,there is a huge challenge to the utilization of spectrum with data-intensive applications demanding large amounts of bandwidth spreading in every aspect of production life.However,spectrum resources,as the key resource for communication development,confront a serious scarcity situation.There are two methods to solve the spectrum scarcity.One is extending spectrum to higher frequency bands,and the other is improving the efficiency of spectrum utilization.Although the spectrum of higher frequency band has fast transmission speed,its wavelength is short,so the energy loss in transmission is significant.Therefore,how to improve the spectrum utilization efficiency in the existing spectrum allocation has become a crucial issue in the development of future communications.Cognitive radio network,as a combined product of cognitive radio and cognitive network,can effectively improve spectrum resource utilization and communication efficiency through spectrum allocation.In cognitive radio networks,users are divided into primary and secondary based on channel occupancy status,and there are two states of active and idle for primary users on the occupied frequency band.When the primary user is idle,a large number of spectrum holes will be generated,which will cause the waste of spectrum resources.In order to rationalize the use of spectrum holes,researchers propose dynamic spectrum access technology.The dynamic spectrum access technology significantly improves the spectrum utilization efficiency through the secondary users’ sensing and opportunistic access to the spectrum holes on the primary user’s frequency band.Nevertheless,to ensure user communication quality,this action for the secondary users to communicate on unauthorized bands requires higher interference management and decision speed,while the access accuracy also depends on the precision of spectrum hole sensing.The goal of this paper is to propose a dynamic spectrum access strategy with fast decision speed and high access accuracy to improve spectrum utilization under the assumption that the spectrum sensing results are not completely correct,employing deep reinforcement learning methods.First,this paper establishes a discrete-time model of a multi-user multi-channel cognitive radio network and develops a distributed dynamic spectrum access strategy for each second user to improve spectrum utilization without affecting the performance of the primary network.Meanwhile,this paper protects the communication quality of the primary users by designing an alarm signal method,i.e.,sending the warning information to the primary system through a dedicated channel when the channel gain of the primary users is low in order to achieve interference control for the primary users.Secondly,due to the large number of nodes and high complexity of inter-node relationships in multi-user multi-channel cognitive wireless networks,traditional algorithms require a large amount of computation in decision making.In this paper,we propose a dynamic spectrum access method for multi-user multi-channel cognitive wireless networks based on an improved deep recurrent Q network(DRQN),which has the same cognitive capability as cognitive wireless networks and has strong selfadaptability,by making full use of the deep reinforcement learning technology.In order to solve the problem of high computation caused by large state space and partial observability in complex cognitive wireless environments with multiple users and multiple channels,we propose to use long-short term memory(LSTM)network as the Q network in deep Q network(DQN)to make full use of historical information to reduce the computation,and to reduce the computation by setting the dropout layer prevents the overfitting phenomenon from occurring.Meanwhile,in order to solve the Q-value overestimation problem in DQN network,the double DQN(DDQN)network is used to train the estimated Q-value and the decision action process with two networks separately to avoid using the same Q-value to train the network and improve the prediction accuracy.The experimental results show that the method can obtain high access accuracy and low interference values.Finally,a PER-DESQN-based dynamic spectrum access algorithm for multi-user multi-channel cognitive wireless networks is proposed in this paper.Since the complex structure of LSTM network in DRQN algorithm leads to the decrease of convergence speed,this paper uses echo state network(ESN)to predict and estimate the Q value using the underlying temporal correlation as Q network,and compared with LSTM,ESN network uses fixed weights instead of traditional gradient descent method for weight update,which greatly reduces the training at the same time,in order to solve the problem of unstable Q values brought by sampling in the experience replay area by random sampling in the DDQN algorithm,this algorithm proposes to use the prioritized experience replay(PER)mechanism based on Sum Tree,and combine the importance sampling principle to optimize the DDQN network to sample the samples in the experience pool by priority to improve the algorithm stability and access accuracy.Simulation experiments show that the PER-DESQN-based dynamic spectrum access algorithm for multi-user multi-channel cognitive wireless networks can make fast and accurate dynamic spectrum access decisions and significantly increase the system transmission rate. |