Font Size: a A A

Trajectory Optimization Of UAV In Wireless Energy Harvesting Network Based On Reinforcement Learning

Posted on:2022-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:T Y JiaFull Text:PDF
GTID:2492306560991809Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,owing to UAVs’ merits including cost-effective,on-demand operations and flexible deployment,UAVs have been widely used in various fields.Especially in the field of communication,UAVs can establish a good line of sight(Lo S)communication link with ground users,which can greatly improve the efficiency of information transmission.Besides,the UAV can be equipped with a fog server to assist ground users to complete computing tasks,and the radio frequency signal transmitter can also be installed on the UAV to send radio frequency signals to charge the ground users.It is noticed that,in the case of dynamic changes of user-related information(such as user location,transmission power),it is challenging to maximize the system network capacity by optimizing the flight trajectory of UAVs,where the UAVs are with limited propulsion energy.In the existing research works,convex optimization method and reinforcement learning method are mainly used for optimizing the UAVs trajectories.As the formulated optimization problem with the UAVs trajectories optimized is very complex,a large number of first-order Taylor expansions and variable substitutions are required to tackle the non-convex problem,and the obtained optimization results are usually deviated from reality.On the other hand,the convex optimization method is difficult to deal with the problems in a dynamically changing environment.Different from traditional optimization methods,the use of reinforcement learning methods to optimize UAVs trajectories has the advantages of low dependence on the environment,real-time interaction with the environment,strong exploration ability and high model reuse rate.The basic theoretical idea of reinforcement learning is to use the rewards based on environmental feedback to make decisions of the next slot through the interaction between the agent and the environment,which is very suitable for UAV-assisted highly dynamic communication scenarios.The UAV-assisted mobile edge computing network is investigated in this paper,where the optimization design of UAV trajectory when ground mobile users use their own limited battery capacity to upload data to UAVs is considered,and the optimization design of UAV trajectory when ground mobile users need to be charged by the UAV is further considered.The details are addressed as follows.(1)For the UAV-assisted mobile edge computing network,the ground users move randomly and upload data to the UAV,where the ground users have sufficient power supply.According to the user’s uploading requirements,the UAV only schedule one user to upload data every time slot.An optimization problem is formulated to maximize the total amount of data uploaded to the UAV by jointly optimizing user scheduling,UAV’s trajectory,and data upload power of each user device subject to the energy constraints of the UAV and the service quality constraints of each user device.And then,Markov decision process is used to solve the formulated problem.A reward function based on user uploading gain and energy penalty is designed,and then a DQN-based UAV trajectory optimization algorithm framework is designed to improve the network capacity.Based on the comparison results of the average returns under different hyperparameter values,it is verified that the proposed algorithm is superior to the traditional Q-Learning method in terms of convergence and throughput,while ensuring the service quality of the users.(2)In the network scenario where battery capacity of the mobile users is limited,we consider that the UAV provides mobile edge computing services for the users while serving as a radio frequency energy source to charge the users.The nonlinear energy harvesting model that is more in line with the characteristics of the practical circuit is used in the charging process.An adaptive ε-greedy strategy algorithm based on DDQN is designed to solve the problem.The simulation results prove that the proposed algorithm is superior to conventional algorithms in terms of the effectiveness and convergence.That is,the adaptive ε-greedy strategy is better than the conventional ε-greedy strategy in performance.
Keywords/Search Tags:UAV network, Edge compute, Random moving, Reinforcement learning, Radio frequency energy harvesting
PDF Full Text Request
Related items