| Traditional Wireless Sensor Network(WSN)is a Wireless Network composed of a large number of Sensor nodes with Wireless communication capability and Cluster Head nodes(CH)in a self-organizing way.The environmental monitoring data generated by the sensor nodes are transmitted in the network in a multi-hop way and gathered to the cluster head nodes.The cluster head nodes with strong computing and communication capabilities further integrate the data and upload it to the Internet.However,due to the limitation of wireless transmission power of sensor nodes,the delay and stability of data transmission between nodes are difficult to be guaranteed.Moreover,it is very difficult to deploy wireless communication facilities such as base stations in open-pit mining areas and disaster areas with complex terrain,which makes it impossible to build traditional wireless sensor networks.UAVs are characterized by low manufacturing and use costs and strong mobility.By installing small wireless base stations on UAVs and collecting data from sensor nodes on the ground in a single-hop way,the problem of limited communication resources in the ground sensor network can be effectively solved.However,realizes the unmanned aerial vehicle(uav)auxiliary wireless sensor network data collection,still faces many difficulties and challenges,mainly reflects in: how to use of unmanned aerial vehicle(uav)flight path,the network resources scheduling policy is optimized,minimizes the data collection task completion time,and minimize the sensor network data presented.According to the above analysis,the specific research contents of this paper are as follows:Research Content 1: Under the Line of Sight(LOS)channel model,a mathematical model to minimize the completion time of data collection task was established by optimizing the association strategy between UAV flight trajectory and ground nodes.In this paper,an environment state feature extraction method is designed for this kind of problem,and a simulated training environment is developed.The Deep Reinforcement Learning(DRL)algorithm is used to conduct interactive training with it.After several iterations,an agent good at solving this problem is obtained.Simulation results show that the algorithm is superior to the traditional optimization methods in performance and generalization ability.Research content 2: The Reth fading channel model is used to replace the line-ofsight channel model,and the scenario of continuous generation of sensor node data is considered.In this scenario,the multi-UAV distributed cooperative trajectory planning and air network resource scheduling were jointly optimized,and a mathematical model to minimize the data loss rate of wireless sensor network was established.The problem has the characteristics of multi-agent and non-ending state,so this paper adopts Multiagent Deep Reinforcement Learning(MADRL)algorithm to solve the problem,and redesigns the reward mechanism and environmental state processing method in the algorithm.Through simulation and analysis,the effectiveness of the algorithm is verified. |