| Under the trend of the increasing shortage of global wireless spectrum resources,non-orthogonal multiple access(NOMA)technology and device-to-device(D2D)technology came into being,as one of the key technologies of 5G,greatly improving spectrum utilization.In order to cope with the massive access requirements brought by multi-scenario,heterogeneous services and large-scale users of next-generation communication networks,as well as the heterogeneous network resource management problem of multi-communication scenario convergence,D2D-NOMA technology is developed,which provides feasibility for achieving fair access for a wider range of users.The high resource reuse feature of D2D-NOMA not only improves system capacity and service capabilities but also brings more complex interference problems,so effective resource allocation schemes and interference management have become core problems that need to be solved urgently.In addition,in the face of the continuous development demand for "high energy efficiency" in the process of 5G to 6G technology evolution,based on the objective conditions of large-scale users and the technical status of high energy consumption of heterogeneous cellular networks,it is necessary to study the energy efficiency optimization of D2D-NOMA communication in heterogeneous cellular network scenarios,and realize efficient resource allocation from user grouping and power allocation,so as to improve system energy efficiency.Aiming at the power allocation problem of heterogeneous cellular networks based on D2D-NOMA,a Dinkelbach enhanced twin delay deep deterministic policy gradient algorithm aiming at maximizing energy efficiency is proposed.Firstly,based on the constructed network model,the optimization problem of system energy efficiency based on user communication rate constraint is proposed.Considering the problem is non-convex,fractional form and NP-hard,a deep reinforcement learning framework with linear and nonlinear parameter optimization reward functions is proposed,which makes the deep reinforcement learning method more suitable for optimization decisionmaking and enhances the training and interaction efficiency.Secondly,by combining the Dinkelbach method to preprocess the original fractional problem,the action space is reduced,and the performance and efficiency of the combined deep reinforcement learning algorithm are effectively improved,and at the same time,as a more advanced deep reinforcement learning algorithm,the twin delay deep deterministic policy gradient optimizes the accumulation of training errors and the typical overestimation problems of single-Q networks compared with traditional DDPG(Deep Deterministic Policy Gradient)and other algorithms.The simulation results show that the proposed algorithm has high decision-making efficiency after training,and can continuously learn and update the strategy for network changes,retaining the high flexibility of the deep reinforcement learning algorithm,and compared with DDPG,DQN(Deep Q Network)and other algorithms,the energy efficiency of the network is significantly improved,and the optimal energy efficiency approximated by the exhaustive limit algorithm can be approximated.Aiming at the problem of D2D-NOMA user grouping and transmission power joint allocation in heterogeneous cellular networks,a K-time iteration user grouping scheme and joint resource management scheme based on the Kuhn-Munkres algorithm are proposed to further optimize energy efficiency.Under the current heterogeneous cellular network model based on D2D-NOMA,the user grouping problem is first constructed as a weighted two-part graph perfect matching problem,taking into account the user Qo S constraints,etc.,by transforming the weight matrix with energy efficiency as the goal,the number of different users is classified and K iterations are completed.Based on the proposed user grouping and power allocation scheme,considering that the trained Dinkelbach-double-delay deep deterministic strategy gradient can effectively reduce the complexity and delay of decision-making,a joint resource management scheme is further proposed to optimize energy efficiency for the coupled user grouping problem and power allocation problem.The simulation results show that in the scenario of a large number of users,the proposed user grouping scheme has higher energy efficiency than the greedy algorithm,and the proposed power allocation and user grouping joint resource allocation scheme performs better than DDPG and other joint schemes. |