| How to satisfy the demands of mass terminals for low-delay transmission is an urgent problem for the fifth generation(5G)to solve.Recently,many researchers continue to investigate advanced wireless transmission schemes.Non-orthogonal multiple access(NOMA)has always been a promising scheme for its excellent delay and capacity performance.Specifically,NOMA allows multiple users to send and receive signals using the same time-frequency resource,which can greatly increase spectral efficiency and network connections.In NOMA,different multiplexed users are identified in the power domain based on successive interference cancellation(SIC),and power allocation will directly affect the decoding reliability of SIC receivers.Moreover,user pairing can affect the energy consumption of SIC receivers,which also deserves further consideration.In order to bring the performance advantages of NOMA into full play,on the one hand,user pairing should be further studied to improve the power efficiency of NOMA.On the other hand,power allocation should be optimized to ensure the reliability of NOMA.In addition,NOMA can be jointly optimized with other 5G technologies.Different techniques complement each other.In summary,this paper studies NOMA from different perspectives.We first consider the resource allocation problem in multi-carrier NOMA(MC-NOMA)system,which mainly includes two sub-problems:user pairing and power allocation.Secondly,the joint optimization of NOMA and multiple-inputmultiple-output(MIMO)is considered.We design a reasonable resource allocation scheme for MIMO-NOMA,so that the space and power domain resources can be utilized rationally,and the system capacity can be increased ten-fold or even hundred-fold.Finally,the combination of NOMA and mobile edge computing(MEC)is considered.In this way,users can offload partial tasks to the MEC server by NOMA,and the task processing delay can be reduced by optimizing the offloading schemes.The main achievements and contributions of this paper can be summarized as follows:(1)Firstly,we consider the resource allocation problem in MCNOMA system to further improve the spectral efficiency.Specifically,the resource allocation problem can be divided into two sub-problems:power and sub-carrier allocation,respectively.In order to ensure the user fairness in the resource allocation process,we dynamically assign a weight to each user,and the weighted sum rates(WSR)maximization is considered as the optimization objective.In this way,edge users can be scheduled with a higher possibility by assigning them relatively larger weights.In addition,in order to ensure that the weights of users can better describe users’real-time performances,the weight of each user is set as the reciprocal of its sliding average rate.In this way,we can balance the tradeoff between user fairness and capacity performance by limiting the length of the sliding window.To solve the resource allocation problem,we propose a new optimization framework:(a)randomly initialize the sub-carrier allocation scheme and consider the power allocation problem within each single sub-carrier;(b)select proper users for each sub-carrier based on the power allocation algorithm so as to improve the total spectrum efficiency.In step 1,the power allocation problem in a single carrier is generally a non-convex problem,which can be transformed into a series of convex problems based on the successive convex approximation(SCA)algorithm.The optimal power allocation solution is obtained by solving the series convex problems using efficient optimization tools.Then,in step two,the sub-carrier allocation problem can be further divided into several dependent sub-problem.The dynamic programming(DP)algorithm can store the solution of each subproblem,which helps to optimize sub-carrier allocation in a recursive way.It can be seen that we do not follow the traditional strong and weak users pairing criterion when optimizing user pairing.On the contrary,WSR maximization is considered as the optimization objective to achieve a balance between capacity performance and user fairness.Based on the low complexity brought by DP,we can effectively reduce the complexity.(2)Secondly,in order to further reduce the computation complexity,we introduce deep reinforcement learning(DRL)to help optimize the resource allocation problem in NOMA.Specifically,the process of solving the resource allocation problem can be represented as a Markov decision process(MDP),which can be optimized by DRL algorithms,e.g.,the twin delayed deep deterministic policy gradient(TD3)algorithm.The neural network is trained offline to generate a multi-task learning model.By designing a suitable mapping,we can obtain feasible solutions of power allocation and user pairing based on the network outputs simultaneously.In this way,we can achieve the joint optimization of power allocation and user pairing,which helps to improve the system performance of the ultimate solution.Moreover,we design a new reward function to evaluate the performance of the network outputs from two perspectives:optimization objectives and constraints.In this way,we can ensure that the network outputs always lie in the feasible domain of the optimization problem and finally converge to the global optimal.When the network converges,the resource allocation strategy can be obtained with linear complexity based on online calculation.(3)In addition,we also consider the joint optimization of MIMO and NOMA to further improve system capacity.In MIMO-NOMA systems,users are divided into several user clusters and we assume that the users within the same user cluster share the same analog beam.Based on multiusers detection(MUD),multiple user clusters are orthogonal to each other,which helps to eliminate the inter-cluster interference.The resource allocation problem in MIMO-NOMA system can be further divided into three sub-problems:beamforming,user pairing and power allocation,respectively.In order to solve the three subproblems with lower complexity,we propose a 3-steps optimization framework.In step 1,we randomly initialize the user paring scheme and further search the optimal analog beam of each cluster.Due to the orthogonality among different user clusters,we can consider the beamforming problem for each user cluster independently.Here,for the sake of simplify,the power allocation scheme can be obtained based on fractional transmitting power control(FTPC).Then,the beamforming problem can be transformed into a nonconstrained optimization problem,and the corresponding optimal solution can be obtained based on a quasi-Newton algorithm.While optimizing user paring,we first generate a series of candidate paring solutions.When generating the candidate solutions,we jointly consider the channel difference and correlation characteristics among the users in the same cluster,so as to achieve the joint optimization of MIMO and NOMA.The capacity performance of each candidate solution can be evaluated via the beam-forming algorithm,and the optimal paring solution can be obtained based on exhaustive search.In this paper,we replace the exhaustive search method with the particle swarm optimization(PSO)algorithm to further reduce the computation complexity.In addition,power allocation can be represented as a difference of convex(DC)programming problem,which can be optimized based on SCA.(4)Finally,we consider the joint optimization of NOMA and MEC.In NOMA-MEC networks,we assume that each user offloads partial tasks to the edge server for further processing and processes the remaining tasks locally.We propose an iterative 2-user NOMA scheme for user task offloading,which effectively reduces task offloading delay.In addition,we introduce the DDPG algorithm to optimize users’ tasks offloading ratios and power.The neural network is trained offline to generate a multi-task learning model.We introduce an upper bound for users’ offloading power and all the power variables can be normalized to the range of[0,1]accordingly.In this way,the ratio and power variables lie in the same range,so that the learning model can generate feasible solutions for all variables simultaneously.In addition,we design a new reward function to evaluate the performance of the learning model,which considers both the optimization objectives and constraints.Our welldesigned reward function can provide appropriate feedback to the neural network,which leads the neural network to search in the right direction. |