| Unmanned Aerial Vehicle(UAV)is believed to play an important role in constructing an integrated communication network of air,sky,sea and ground in 6G because of its advantages of simple control,flexibility and adaptability to a variety of communication environments.Compared with the fixed base station on the ground,the UAV base station is not only easy to form good channel conditions,but also can flexibly adjust the location and flight path according to the distribution of users on the ground,so as to achieve better communication quality.UAV base station is considered to have important application prospect and value in emer-gency communication recovery,short-time large gathering communi-cation enhancement,remote area communication coverage and other scenarios.In recent years,the trajectory design problem in the UAV assisted communication scenario has attracted the attention of many scholars.With the UAV being studied and applied to more auxiliary communica-tion scenarios,the research of its trajectory design also faces many demands and problems.This paper analyzes the trajectory design problem in the single UAV scenario and the UAV cluster scenario respectively,aiming to maximize the UAV data throughput and improve the UAV communication performance.Based on reinforcement learning and its advantages in solving complex scene problems,the following problems are studied:(1)In this paper,the trajectory design problem in the scenario of single UAV base station is firstly studied.Under the influence of ground user mobility,the deployment position of base station is dynamically adjusted by optimizing the flight trajectory of UAV,so as to improve the data throughput of UAV and guarantee the service quality of user.Based on the time discretization method,this paper solves the problem of finite number of optimal discrete location points of UAV base station.The optimization process of a single position point is modeled as a Markov decision process,and a value-based deep reinforcement learning algorithm model is adopted to solve the problem.With the advance of time,the ground user’s position movement causes the time change of the interactive environment of reinforcement learning.In order to overcome the influence of time-varying environment on the algorithm model,a scheme is proposed to enhance the generalization ability of the model and fine-tune the neural network parameters with the changing environment,and complete the design of the entire flight path of the UAV.(2)Then,the paper further considers the multi UAVs base station communication scenario with the presence of the same frequency interference,and coordinates the operation of the multi UAVs base stations by jointly optimizing multiple flight paths.The optimization objective is to maximize the data throughput of the smallest individual in the UAV cluster to improve the communication performance of each UAV base station.Because multiple UAVs need to cooperate to cover ground users and compete with each other to maximize their throughput,a complex relationship of cooperation and competition is formed.The joint multitrajectory design problem becomes a challenging Markov game problem.Considering the influence of user mobility,this paper first proposes a method of user periodic clustering and UAV channel allocation.On this basis,a multi-agent deep reinforcement learning algorithm model is adopted to optimize the flight path of multi UAVs.Finally,the simulation results show that the proposed optimal design scheme can improve the throughput performance of all UAVs. |