| The intelligent sensor with computing capability can make control decisions by monitoring changes in the physical environment,thus shows great potential in automatically detecting early signs of public hazards,such as wildfire detection.However,although it has significant advantages in reducing labor costs,it is difficult to achieve high-precision hazard identification only by relying on power and computing limited sensor devices.Due to the flexibility of deployment,UAVs can provide computing services close to sensor devices from bird’s eye view.Therefore,the team-up between UAVs and sensors results in a low-cost yet effective edge computing system for responsive environmental monitoring.In such a system,energy efficiency and decision-making timeliness are key performance indicators.Therefore,how to design energy-efficient computation offloading and path planning strategies under the condition of ensuring timeliness of decision is a problem worthy of study.In view of this,this thesis first discretizes the UAV movement direction variable,and sets the movement speed as a constant,and studies the UAV path planning strategy.Then,in order to model the energy consumption of UAV and realize its flexible control,the direction and speed of UAV movement are relaxed as continuous variables,and the path planning and computation offloading strategy based on federated reinforcement learning is studied.The main contributions of this paper are as follows:(1)In the UAV assisted edge computing system,the energy consumption of power limited sensor device is a factor worthy of attention.With the goal of reducing the total energy consumption of sensor device,this paper models the path planning of UAV,the energy consumption of computation offloading and the timeliness of offloading,and forms the optimization problem of sensor device energy efficiency under the constraint of computation offloading timeliness.This thesis first discretizes the UAV movement direction variable,and sets the movement speed as a constant.Due to the dynamic randomness of the process of task arrival and computation offloading in the system,this problem is a long-term stochastic optimization problem.This paper designs a centralized path planning strategy based on deep reinforcement learning.The network model can make online decisions for multiple UAVs after centralized offline training.The simulation verifies the competitiveness of this strategy in improving the device energy efficiency and offloading timeliness.(2)In the UAV assisted edge computing system,this thesis further considers the energy consumption of UAV and extends the flight direction and speed to continuous variables,establishing an optimization model for the total energy consumption of the system under the constraint of offloading timeliness.The problem belongs to long-term stochastic optimization and mixed integer nonlinear programming,while the existing reinforcement learning algorithm can only deal with a single discrete or continuous variable,so this article adopts the reparameterization method to handle mixed strategies.On the other hand,when the action space of the centralized learning algorithm increases,the complexity of the network structure increases and it is difficult to converge.Therefore,the distributed federated learning is used for network training,and the network convergence is promoted through model parameter sharing.After distributed training,the network model at each UAV can make independent real-time decisions.The simulation results show that the distributed path planning and computation offloading strategy based on federated reinforcement learning designed in this paper has advantages in improving system energy efficiency and offloading timeliness. |