| As the main access method for wireless communication systems,orthogonal frequency division multiple access(OFDMA)divides the transmission bandwidth into a series of orthogonal subcarrier sets and allocates different subcarrier sets to different users at the same time,thus greatly improving the system performance.On the other hand,beamforming coordination,a common technique in multi-antenna systems,reduces the co-channel interference in cellular networks and increases the system capacity by adjusting the beam direction map to maximize the signal to interference plus noise ratio(SINR)at the output of the beamformer.However,the ability to exploit the full benefits of OFDMA technology and beamforming coordination depends on the efficiency of resource allocation technologies.In this paper,taking the data transmission rate and other objectives of communication systems into account,we study beamforming coordination and resource allocation for both uplink and downlink models of multi-cell OFDMA systems.The main work of this paper is as follows:Firstly,This paper studies beamforming coordination and resource allocation in downlink multicell OFDMA systems.The objective is to maximize the sum of data transmission rates of all users while the transmit power of all base stations is less than the maximum transmit power budget.To address this dynamic optimization problem,we propose a multi-agent deep Q-network(MADQN)method.According to the channel gain and other information of each user,this method first generates the initial subcarrier allocation scheme,then seeks the power allocation and beamforming coordination schemes.Finally,this method updates the resource allocation and beamforming coordination schemes based on the feedback reward.Simulation results show that the MADQN method improves the transmission rate of users in OFDMA systems while satisfying the maximum transmit power budget of base stations.Secondly,based on MADQN,we propose a transfer learning-based resource allocation optimization algorithm framework called transfer learning-MADQN(TL-MADQN).This framework addresses the problem of how to quickly and effectively train new neural networks in new communication environments when the communication environment changes.Simulation results show that the TL-MADQN framework speeds up the convergence of DQN in new communication environments as well as further improving the data transmission rate of users in OFDMA systems.Thirdly,this paper studies resource allocation and reconfigurable intelligence surface(RIS)control problem in RIS-enhanced uplink multi-cell OFDMA systems.The objective is to maximize the sum of data transmission rates of all users while ensuring the minimum data rate for all users.For subcarrier allocation,we propose a multi-agent double deep Q-network(MADDQN)method to solve the problem that the MADQN method is too optimistic when estimating the Q-value of some actions.For power allocation and RIS control,we propose a multi-agent deep deterministic policy gradient(MADDPG)method which solves the problem of dimension growth for the action space and quantization error caused by action discretization in the MADQN method.In addition,we propose a bi-directional transfer learning(BDTL)framework which improves the convergence speed of neural networks in different communication environments by exploiting the common parameters in different communication environments.Simulation results show that:(1)RIS significantly improves the performance of OFDMA systems;(2)MADDQN & MADDPG method improves the data transmission rate of users in OFDMA systems and the convergence speed of neural networks;(3)the BDTL framework further accelerates the convergence speed of neural networks in different communication environments. |