With the increasing popularity and development of mobile communication technology,the number of mobile users is rapidly increasing and the volume of information data is growing exponentially,which makes wireless communication networks face huge challenges.Nonorthogonal multiple access(NOMA)has created an innovative multiplexing method of power domain multiplexing,allowing multiple users to transmit signals concurrently on a single block of time and frequency band resources,effectively improving spectrum utilization while increasing the access of wireless communication devices.How to maximize the performance of current NOMA communication systems has been the focus of many researches.Most of the current performance optimization researches discuss and analyze the power allocation problem,but due to the characteristic of NOMA technology,the users in the scenario need to pair up with each other in order to transmit information.The user pairing scheme directly affects the information transmission rate of each user,which in turn affects the performance of the NOMA system,so how to find the optimal user pairing scheme becomes a critical problem to further improve the performance nowadays.Reinforcement learning is a method with self-adaptive capability to take optimal actions and solve optimization problems by interacting with the environment.Using reinforcement learning to solve problems in communication systems has also become more common in recent year.In this thesis,we have conducted an in-depth study on the problem of optimal user power allocation and the problem of finding the optimal user pairing scheme for downlink multi-carrier NOMA scenarios using reinforcement learning,as follows.First,for the resource allocation problem of downlink multi-carrier NOMA system with perfect channel state information(CSI),we propose a combined optimization problem of channel power allocation and user pairing,and then decompose it into two subproblems for solving.The subproblem I,i.e.,the single-channel two-user scenario of sub-channel power allocation,is analyzed and optimally solved to obtain the closed-form optimal solution of power allocation within a single sub-channel under the premise of satisfying the minimum transmission rate of users.For subproblem two,i.e.,the user pairing problem in the scenario,a reinforcement learning method is used to find the optimal pairing solution,and a user pairing algorithm based on advantage actorcritic(A2C)is proposed,which can search for the optimal user pairing solution in a short time.Simulation results show that the pairing algorithm proposed in this paper leads to a higher performance improvement of the system compared with the traditional NOMA pairing method.Second,for the resource allocation problem of a downlink multi-carrier NOMA system with imperfect CSI,the linear minimum mean-square error estimation(LMMSE)is used to derive the communication reachable rate equation for the scenario users,and the combinatorial optimization problem in this scenario is proposed.For the single-channel power allocation optimization,the problem is proved to be a convex optimization problem,and the closed-form optimal solution of the user power allocation under the imperfect CSI condition is obtained by the Karush-KuhnTucker(KKT)condition.For the user pairing problem in this scenario,the proximal policy optimization(PPO)based user triple policy optimization(UTPO)is proposed by improving the AC framework.The user pairing algorithm,PPO,accelerates the convergence speed compared to A2 C and makes the update of network parameters more adequate and rapid.The results demonstrate that the proposed UTPO user pairing algorithm performs approximately 35% more efficiently than the A2C-based user pairing algorithm,allowing for higher sum rate performance. |