With the increasing demand for data services in the post-5G era,the problem of high operating costs of 5G networks has become increasingly prominent.At the same time of high-speed construction,its operating efficiency needs to be improved.The number of 5G connections is constantly creating new highs.However,spectrum resources are always limited.Therefore,an effective resource allocation scheme is needed to improve the utilization of limited bandwidth resources,thus reducing the operating costs and ensuring the quality of communication services for users.Intelligent resource allocation and power control schemes are considered to be important methods to alleviate the problems caused by the sharp increase in the number of users and operating costs.Therefore,based on Multi-Agent Deep Reinforcement Learning(MADRL),this paper studies and explores intelligent schemes for resource allocation and power control in frequency domain.The main work and innovation of this paper are as follows:First,this paper proposes a novel algorithm based on multi-agent deep reinforcement learning to jointly optimize resource block(RB)allocation and power control.Its purpose is to maximize the average spectral efficiency(SE)of the system under the premise of meeting the quality of service constraints.In view of the advantages of centralized training while reducing the amount of computation and signaling overhead,the distributed execution of centralized training can adopt MADRL technology.Considering that centralized training and distributed execution retains the advantages of centralized training while reducing the computation and signaling overhead,MADRL technology can be used.In the proposed MADRL model,the action value function of each agent is aggregated through the value decomposition network,which strengthens the cooperation between agents and improves the convergence of the algorithm.Secondly,this paper innovatively adds a reward discount network to the original MADRL framework to further improve the average spectral efficiency of the proposed algorithm in the multi-cell multi-user communication environment.The reward discount network adjusts the degree of attention to future rewards in real time and adaptively according to the performance of the agent in the training process.In this way,the value of the reward discount factor can be dynamically adjusted,which is most suitable for the convergence of the neural network.In order to avoid the laziness of the agent,this paper adds a correction term to the loss function used to train the reward discount network,so as to maximize the value of the reward discount factor and extend the planning scope of the agent for the future.Simulation results show that the algorithm has better performance and stability than the existing alternatives. |