| In recent years,with the increasing demand for bandwidth of mobile multimedia services,radio spectrum resources are becoming increasingly tense.Advanced radio transmission technology that can make more effective use of scarce radio spectrum is needed.Cognitive radio(CR)is widely regarded as a promising solution to address the shortage of radio spectrum resources and is also an important driving force to realize the dynamic spectrum access of 5G(5th Generation Mobile Communication Technology)system.Network virtualization can create network slices and customize them according to different service requirements.Network slicing technology can significantly improve spectral efficiency(SE)and realize services with strict quality of service(QoS)requirements.In addition,mobile edge computing(MEC)is a key technology to meet the requirements of high reliability and low latency of mobile communication systems.The optimization problems established in cognitive wireless network slicing scenarios are often non-convex,and the communication environment will change dynamically as the transmission tasks proceed.The overall computational complexity of the system is high.Reinforcement learning(RL)is used to solve the problem of selecting the best action from each system environment state.RL is a trial-and-error learning method,which can interact with the dynamic communication environment and learn the optimal resource allocation strategy.This paper proposes resource allocation frameworks based on reinforcement learning,and carries out work in the following three aspects to improve network performance.(1)For a downlink multi-user CR-NOMA network,a cognitive radio network resource allocation scheme based on non-orthogonal multiple access(NOMA)with hybrid spectrum access mode is proposed.Secondary users(SUs)use NOMA for multiplexing and hybrid spectrum access mode to further improve the spectrum efficiency.Considering the demand for multiple services,the enhanced mobile broadband(eMBB)slice and ultrareliable low latency communication(URLLC)slice are established.The goal is to jointly optimize SE and the QoS for the users.The mapping relationship between resource allocation and algorithm is established in CR-NOMA network.According to the signal-to-interference-plus-noise ratio(SINR)of the primary users(PUs),the proposed scheme can output the optimal channel selection and power allocation of SUS.Simulation results reveal that the proposed scheme can converge faster and obtain higher rewards compared with the Q-Learning scheme.Additionally,the proposed scheme has better SE than both the overlay and underlay only modes.(2)For a downlink cognitive radio network,a cognitive network resource allocation scheme based on Actor-Critic is proposed to coordinate the allocation of accessible spectrum resources in both licensed and unlicensed bands,so as to realize the harmonious coexistence between the secondary users in licensed and unlicensed bands.Multiple slices are established for different services.While ensuring the QoS,the spectral efficiency of cognitive network in unlicensed band is improved as much as possible,and the cost of occupying licensed spectrum resources is reduced.The proposed scheme can deal with continuous action space and select the optimal channel and power for SUs.Simulation results reveal that the proposed scheme can obtain higher rewards and lower outage rate compared with the DQN scheme.(3)For an uplink cognitive radio network,a DDPG based slicing resource allocation scheme in joint communication and edge computing networks is proposed.Considering the demand for multiple services,the URLLC slice and eMBB slice are established.The users on URLLC slice will offload the computing task to MEC server for processing.While ensuring the QoS,the network spectrum efficiency is improved as much as possible and the energy consumption is reduced.The resource allocation is regarded as a constrained optimization problem and the neural network model is designed according to the simulation scenario.Simulation results reveal that the proposed scheme can obtain higher rewards and lower outage rate compared with the DQN scheme. |