Font Size: a A A

Research On Radio Resource Scheduling Algorithm For Mobile Communications Based On Deep Reinforcement Learning And User Location Information

Posted on:2023-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:X N LiFull Text:PDF
GTID:2558306908965279Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As the scale of mobile communication networks becomes larger,interference more complex and services more diversified,the contradiction between the shortage of spectrum resources and the increasing demand of users becomes more and more serious.In order to effectively improve spectrum resource utilization and user service quality in more complex communication environments,it will be extremely important to study more efficient resource scheduling algorithms.Therefore,this thesis improves the performance of resource scheduling algorithms by further mining the intrinsic knowledge of mobile communication systems and combining deep reinforcement learning.The specific work is as follows:First,this thesis introduces the technical theories of deep reinforcement learning and resource scheduling,and proposes a novel resource scheduling algorithm called user location based proximal policy optimization(PPO)with reward shaping(UL-PPORS).On the one hand,the user location network is designed to help the agent to obtain the location information of the user without additional signaling overhead,thus providing more valuable state information for the agent.On the other hand,the reward function is effectively shaped for large,medium and small traffic intensity respectively,so as to ensure that the agent can obtain higher spectral efficiency under different traffic intensity.Subsequently,the above algorithm design is verified by simulation.Through the comprehensive analysis of cumulative reward convergence curve,system throughput convergence curve,system resource occupancy curve and system interference curve,the effectiveness of the reward shaping function design in this thesis is verified.Then,based on the above reward shaping function,the UL-PPORS algorithm and PPO resource allocation algorithm without user location information(PPO algorithm)are simulated and compared under different traffic intensities.The results show that UL-PPORS algorithm can effectively mitigate system interference and achieve higher spectral efficiency.Compared with the PPO algorithm,the spectral efficiency of UL-PPORS algorithm can be improved up to 10.2%.Finally,based on the UL-PPORS algorithm,this thesis proposes a novel resource scheduling algorithm called graph convolutional network based PPO with reward shaping(GCNPPORS)to further improve the performance of the resource scheduling algorithm.The algorithm uses the user location information obtained from the localization network to model the irregular mobile communication scenario into a graph structure that can reflect the interference relationship between users.Meanwhile,a policy network based on the structural model of graph convolutional network is designed to further optimize the resource scheduling policy by taking advantage of the ability of graph convolutional network to extract the spatial features of graph data effectively.After that,the GCN-PPORS algorithm and UL-PPORS algorithm are compared and analyzed under the same simulation parameter settings.The results show that GCN-PPORS algorithm can obtain a resource allocation strategy with lower interference and higher system throughput,and the algorithm still has excellent performance as the scale of the communication network increases.Compared with UL-PPORS algorithm,the spectral efficiency of GCN-PPORS algorithm can be improved up to 17.8%.
Keywords/Search Tags:Resource Scheduling, Deep Reinforcement Learning, User Location Information, Proximal Policy Optimization, Reward Shaping, Graph Convolutional Network
PDF Full Text Request
Related items