Driven by the sixth-generation mobile communication technology,the increasing emergence of real-time communication applications places extremely high requirements on the transmission rate,delay and reliability of wireless ad hoc networks.Therefore,under the condition of limited wireless communication resources,the design of a dynamic resource allocation scheme in a wireless ad hoc network becomes particularly significant.However,previous work has not fully studied the influence of the dynamics of wireless ad hoc networks on the dynamic resource allocation schemes.In view of this,this thesis aims to design an intelligent resource dynamic allocation policy under the unknown network dynamics for the wireless ad hoc networks represented by the Internet of Vehicles(Io V)and the unmanned aerial vehicle(UAV)networks,respectively.The main work can be summarized into the following two parts:(1)In an Io V network,we consider the dynamics of vehicular mobility,and study a joint spectrum and power allocation optimization problem when vehicular users and cellular users coexist.to minimize the sum of the average Age of Information(Ao I)of all V2 V and V2 I links and the average power consumption of all vehicular user pairs.By adopting the trust region policy optimization(TRPO)algorithm that can ensure the monotonic improvement of policy iterations,we propose an Ao I-aware joint spectrum and power dynamic allocation scheme based on the TRPO algorithm to minimize the sum of the average Ao I of all links and the average power consumption of all vehicle user pairs in an unknown dynamic environment.Simulation results verify the superiority of the proposed scheme compared with the baseline schemes in terms of average cumulative reward,convergence speed and stability.(2)In a UAV network,we consider the dynamics of UAV’s high mobility,and study a dynamic spectrum allocation optimization problem when the external malicious jamming and the co-channel mutual interference exist simultaneously.By adopting the soft Q-learning with mutual-information regularization(SQLMI)algorithm that can conduct targeted exploration of the action space,and with the aid of the multi-agent collaborative framework,we propose a dynamic spectrum allocation scheme based on the multi-agent collaborative SQLMI algorithm to maximize the average throughput of the UAV network in an unknown dynamic environment.Simulation results verify the superiority of the proposed scheme compared with the baseline schemes in terms of average throughput,convergence speed and stability. |