Font Size: a A A

Research On UAV Cooperation Strategy Based On Reinforcement Learnin

Posted on:2024-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X D TangFull Text:PDF
GTID:2532307148963079Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,technological development has led to an increase in devices connected to wireless communication networks,resulting in a shortage of ground communication resources,which has affected the quality of service and communication performance of ground users.The use of UAV as aerial base stations in wireless communication systems has attracted much attention,but existing research often relies on deterministic optimization and requires clear environmental information.The Reinforcement Learning(RL)algorithm can interact with the environment,learn the best strategy,and explore in unknown environments.Therefore,the research on RL based UAV communication systems in edge network environments is of great significance.Previous research on emergency assisted communication of UAV typically required known environmental information,but in disaster relief scenarios,the environment was complex and dynamically changing.In addition,previous studies only focused on the optimization of UAV,neglecting the importance of ground equipment optimization.A solution based on reinforcement learning is proposed to address the above issues,which optimizes UAV and ground devices through collaboration between edge agents and UAV agents,providing cost-effective temporary infrastructure communication for end users in complex environments.Firstly,the problem was modeled and the optimal solution was proposed based on observable workload and link connectivity.Furthermore,a solution based on Cooperative learning is proposed,including the collaborative design of edge agents and UAV agent.In the experimental comparison,in various different task load distribution scenarios,this scheme has the lowest energy consumption.In different terrain environments,compared to other schemes,this scheme reduces energy consumption by 22%and improves energy efficiency by 29%.In the comparison of multiple indicators of ground nodes,this scheme has the best energy consumption and delay performance,with the energy consumption of single hop nodes reduced by 5.7% compared to the comparison scheme.In previous multi-agent UAV assisted communication schemes,some schemes treated each agent as an independent individual,ignoring the presence of other agents in the environment,resulting in the environment no longer being stable.In addition,the action space of UAV is continuous,and discretization action will lead to performance degradation.As the number of agents increases,the input dimension of the network becomes huge,making it difficult for the network to train and converge.A continuous control algorithm based on value function decomposition method is proposed to address the above issues.By combining the actor-critic algorithm with the QMIX algorithm,the UAV agent has a continuous action space and can extract the optimal policy from the joint action value function.Specifically,by combining the actor critic algorithm with the QMIX algorithm,actor is used to make action decisions for each UAV based on environmental conditions,and the mixing network is used to learn joint action value functions.And further decompose the joint action value function to obtain the critics for each agent,and evaluate the actor’s action decisions.In the experimental comparison,in the scenario of 30 edge nodes on three UAVs,this scheme achieved a 28.6% throughput improvement compared to the comparison scheme,and in the comparison of throughput in different terrain environments,this scheme increased throughput by 14.6% compared to the comparison scheme.
Keywords/Search Tags:Edge network, UAV, Reinforcement learning, Multi-agent reinforcement learning
PDF Full Text Request
Related items