| The development of mobile communication technology has created favorable conditions for the implementation of real-time Internet of Things(IoT)applications,but the high complexity of these applications exacerbates the problem of resource constraints for IoT devices.At the same time,the existing mobile edge computing(MEC)architecture relies on base stations,which have limited network coverage,so it is difficult to satisfy unevenly distributed communication and computing demand of IoT devices in spatial and temporal.Therefore,as a supplement to the existing MEC architecture,MEC based on unmanned aerial vehicles(UAVs)has attracted more and more attention.By deploying communication and computing resources on UAVs,UAVs-assisted MEC network can achieve seamless resource coverage and flexible resource allocation.However,the dynamic changes of channels,tasks and locations between UAVs and devices,as well as resource constraints,increase the difficulty of task offloading,position scheduling and resource allocation.Therefore,this thesis adopts the deep reinforcement learning(DRL)to explore the influence between task offloading,position scheduling and resource allocation in the process of mobile edge computing to deal with the dynamic and resource constraints in the scene.The main research contents of this thesis are as follows:(1)For the scenario where IoT devices are concentratedly distributed and generate complex computing tasks continuously,considering the dynamic communication and computing needs of IoT devices,as well as energy constraints,a system model of a single UAV assisting multiple IoT devices edge computing is built with the goal of maximizing the task processing success rate and energy efficiency of IoT devices.In order to explore the interplay between UAV’s position scheduling,computing resource allocation,and IoT devices’ tasks offloading,improving task processing success rate and energy efficiency,as well as enable overall evaluation of the joint strategies,an optimization scheme based on Multi-agent Deep Deterministic Policy Gradient(MADDPG)is proposed.The simulation results show that the proposed scheme has higher success rate and energy efficiency than the schemes based on fixed resource allocation strategy and single agent deep reinforcement learning algorithms.(2)For the scenario where IoT devices’ resource demands are spatially uneven distributed and environmental information is incomplete,considering the constraints of communication resources,computing resources and energy of UAV,a system model of a single UAV assisting multiple IoT devices edge computing is built with the goal of minimizing UAV’s energy consumption and overall service time.For the purpose of making the UAV explore sufficiently to deal with the unpredictability of the scene and learn trajectory optimization strategy quickly,a soft actor-critic algorithm combined with intrinsic curiosity module(ICM-SAC)is proposed in this thesis.By adding intrinsic rewards to cope with sparse environment rewards,the exploration capability of UAV has been improved.The simulation results show that the proposed algorithm is obviously superior to other benchmark schemes in terms of energy consumption and overall service time of UAV.(3)For the scenario where a large number of IoT devices generate computing requirements continuously,considering the constraints of coverage,computing resources of UAV and the timeliness of each task,a system model of multi-UAV combining base stations assisting multiple IoT devices edge computing is built with the goal of maximizing the task processing success rate and timeliness of IoT devices.In order to cope with the real-time communication and computing needs of IoT devices and the challenges posed by the unbalance of UAVs’ load,a soft actor-critic algorithm based on multi-agent reinforcement learning(MASAC)is proposed to jointly optimize the positions scheduling and tasks offloading strategies of UAVs,as well as explore the offloading matching relationships between UAVs and IoT devices.Simulation results show that compared with the optimization schemes based on other multi-agent reinforcement learning algorithms,the proposed optimization scheme can significantly improve the processing success rate and timeliness of tasks. |