In recent years,the research and development of unmanned aerial vehicles(UAVs)have gradually matured,and the manufacturing costs have significantly reduced.UAVs have been widely used in various fields.In edge computing systems,leveraging the high flexibility and easy deployment of UAVs,they can be used as edge servers to provide services such as task offloading and emergency communication to users.Due to the dynamic nature of the distribution of ground users and their evolving business requirements,it is challenging to provide efficient trajectory planning and resource scheduling for UAV edge computing systems through fixed pre-defined solutions.Based on the distributed decision-making capability of UAV swarms and the advantages of multi-agent reinforcement learning in handling complex multi-agent scenarios,each UAV is treated as an intelligent agent.By observing real-time environmental information and dynamically adjusting service strategies,the efficiency of edge services can be improved while making full use of UAV energy.However,in UAV edge computing systems,the training and collaboration information among agents need to be transmitted through wireless channels.This leads to a competition for communication resources between offloading services and multi-agent training,greatly affecting the efficiency of UAV services and spectrum utilization.To address this issue,this paper first models the UAV edge computing system and formulates an optimization problem considering user experience and energy utilization efficiency.Then,a two-layer algorithmic structure is proposed to solve the optimal trajectory planning and resource scheduling.The UAV layer adopts the MDMADDPG(Memory-driven Multi-Agent Deep Deterministic Policy Gradient)algorithm for UAV trajectory planning to achieve an optimal combination of service and training topologies.The user layer uses a simulated annealing algorithm to solve the user access point selection and spectrum scheduling problem,maximizing the system’s service efficiency and energy utilization efficiency.Experimental results demonstrate that the proposed path planning and resource scheduling algorithms can reduce the overall energy consumption of the system while ensuring user experience,providing greater advantages compared to other algorithms.Furthermore,during the process of UAV offloading services,when certain UAVs experience energy depletion or changes in user demands leading to replacement of UAV service nodes or variations in the number of services,retraining the deep neural networks of new UAVs would consume significant energy and degrade the quality of UAV services.To address this issue,this paper discusses scenarios where multiple UAV swarms may undergo changes.Leveraging the advantages of transfer learning by utilizing prior knowledge,it is applied to the training process of the MDMADDPG network,accelerating the training of intelligent agents and saving energy consumption in UAV networks.Experimental results show that using a transfer learning-based training approach can greatly improve the convergence speed of UAV networks and save training energy,thus further enhancing service efficiency based on the advantages of multi-agent reinforcement learning in various scenarios. |