| Dense heterogeneous networks enhance spectral efficiency by allowing multiple communication systems to share spectrum,but they can cause serious interference management problems.Cognitive radio is an effective way to solve these problems,so dense heterogeneous cognitive networks are needed to meet the needs of future wireless communications.As a low-cost technology to improve spectrum utilization,Device-to-Device(D2D)communications has received extensive attention from academia and industry.D2D communications can enable neighboring users to establish a direct communication link,which can effectively improve spectrum utilization.At the same time,D2D communications sharing cellular spectrum will introduce serious co-channel interference,especially when users are densely distributed.Therefore,the interference management problem of D2D communication in dense heterogeneous cognitive networks is an urgent problem to be solved.Cognitive radio is an effective way to solve this problem.D2D communications as cognitive network intelligently multiplexed cellular network licensed spectrum,which can eliminate the interference between the primary network and the cognitive network.As a key technology of cognitive radio,wireless network resource allocation aims to alleviate interference and improve resource utilization efficiency through reasonable resource allocation.Therefore,it is necessary to study the resource allocation of D2D communications in dense heterogeneous cognitive networks.This thesis is supported by Chinese Natural Science Foundation Proj ect“Sensing and Utilization of Polarization Resource in a Dense Heterogeneous Wireless Environment"(No.61571062).In this thesis,the co-channel interference generated by D2D communications in dense heterogeneous cognitive networks is analyzed.The resource allocation algorithms of D2D communication are studied to solve this interference management problem.A location-aware hypergraph based spectrum allocation algorithm and two deep reinforcement learning based resource allocation algorithms are proposed respectively,and the main research contents and contributions are as follows:In dense heterogeneous cognitive networks,a D2D spectrum allocation algorithm based on location-aware hypergraph is proposed for cumulative interference problems caused by densely distributed D2D communication sharing spectrum with cellular users.The algorithm introduces a hypergraph to model the co-layer cumulative interference among multiple D2D users and introduces a location-aware region to model the cross-layer cumulative interference of multiple D2D users to cellular users,designs a hypergraph coloring algorithm to allocate spectrum to D2D users which can maximize system capacity while ensuring communication quality of cellular users.The simulation results show that the proposed algorithm can reduce the outage probability of the cellular user to close to 0,at the same time,the proposed algorithm can achieve higher system capacity than existing graph theory based methods.Therefore,the proposed algorithm can be applied to resource allocation of D2D communication in dense heterogeneous cognitive networks,which can reliably protect the communication quality of cellular users and improve spectrum utilization.In the dense heterogeneous cognitive network,existing algorithms will cause problems such as large signaling overhead and high computational complexity due to the large increase in the number of devices.Therefore,this thesis proposes two distributed resource allocation based on deep reinforcement learning.These algorithms enable the D2D communication pair to have cognitive ability and realize D2D communications for autonomous spectrum selection and power control.These two algorithms model the distributed spectrum allocation and power control problems of D2D communication from the perspective of single agent reinforcement learning and multi-agent reinforcement learning,learn strategies from users'historical data to guide immediate resource allocation,use location-aware regions to select state characters,design reward function punishing strategies that cause harmful interference to cellular users and D2D users and rewarding strategies that increase D2D link capacity;a single agent deep reinforcement learning model and a multi-agent deep reinforcement learning model are established to learn individual optimization strategies and cooperative optimization strategies;the simulation results show that the individual optimization strategy has a faster convergence speed,and the cooperative optimization strategy can achieve better system performance.Therefore,these two algorithms can be applied to the resource allocation in the fast change of the communication environment and with the high communication performanLce requirements,respectively.And they can still protect the communication quality of cellular users and D2D users while improving spectrum utilization without requiring global channel state information. |