| Improving the utilization efficiency of limited radio resources has always been the key research direction of mobile communication technology.Due to the influence of unstable link quality,different traffic volume,diverse service types,user mobility and other factors in the communication system,the resource scheduling of complex environment and large-scale information needs stronger computing power and decision-making ability.Combined with deep reinforcement learning technology,this thesis studies and analyzes the downlink radio resource scheduling algorithm,hoping to extract the potential value of user information and schedule resources intelligently,further to improve the efficiency of resource utilization,user service quality and other key system performance.First of all,in order to solve the problem of low resource utilization and lack of flexibility in the traditional fixed resource scheduling method,the radio resource scheduling problem is modeled as a Markov decision model,and designs a downlink radio resource scheduling algorithm based on the proximal policy optimization(PPO)algorithm.Considering the joint information and dimensional characteristics of each base station in the communication environment,the spectrum resources are scheduled independently and reasonably according to the user request.At the same time,the algorithm simulation scheme is designed by using NS3-Gym framework,including user distribution,service intensity,action mask,reward alignment,neural network and other modules.In addition,a reward function considering user throughput and fairness is designed as the basis for the agent to make decisions and update strategies,and weight factors are added to adjust the emphasis on throughput and fairness according to different needs.Then,this thesis tests the algorithm with different weight factors under the uniform and non-uniform user distribution scenarios.Through the V value and cumulative reward convergence curve,it is verified that the algorithm can well adapt to uniform and non-uniform scenarios,and form a stable strategy.The configuration with the best comprehensive performance is selected according to the spectrum efficiency curve,and compared with the commonly used scheduling algorithms under the same conditions.The results show that the algorithm has better performance in terms of spectrum efficiency and fairness in both uniform and non-uniform scenarios.Finally,this thesis optimizes the resource scheduling model based on the analysis of the characteristics of high dynamic scenarios.At the same time,based on the proximal policy optimization algorithm,this thesis designs a downlink radio resource scheduling algorithm,which is suitable for high dynamic and multi-service mobile communication systems,and designs and implements the motion module,handover module and quality of service(Qo S)module in the algorithm simulation scheme.In particular,the combination of recurrent neural network(RNN)and Actor network is introduced to analyze and process the user movement track information and handover information in the time series data,so that the agent can better deal with the high dynamic scenario;in order to improve the user service satisfaction,a new reward function is designed considering the delay and service priority.Furthermore,simulation experiments under different speeds show that compared with common scheduling algorithms,the designed algorithm can effectively reduce transmission delay and retransmission rate,improve user service satisfaction,and take into account both spectrum efficiency and fairness. |