| In recent years,with the development of big data,Internet of Things(IoT)and other technologies,the number of mobile terminals is increasing,resulting in the explosive growth of data traffic in the network.In order to realize a series of emerging applications,the computing demand pressure of terminal equipment has become very huge.The future Internet of Things needs to support real-time communication and computing of large-scale low-power wireless devices.How to alleviate the computing pressure of devices and realize the sustainable operation of devices has become an urgent technical problem to be solved.By combining Wireless Power Transfer(WPT)and Mobile Edge Computing(MEC),The WPT-MEC system can provide rich computing resources and sustainable energy supply for wireless terminal devices,and significantly prolong the life cycle of the system while completing computing tasks.Because of the time and space coupling between various resources,it is more challenging to design a reasonable resource allocation and task scheduling strategy.Therefore,it is of great practical significance to study the task scheduling strategy in the Internet of things combined with wireless power transfer and mobile edge computing.In the existing research,most researchers mainly focus on the onetime optimization under static wireless channel or given user computing task,that is,short-term optimization,while ignoring the time variability and dynamics of energy collection process and computing task arrival.Therefore,considering the dynamic network information and aiming at improving the computing power of the system,this thesis studies the scheduling strategy optimization of Internet of things(IoT)devices in WPT-MEC system.In this thesis,we jointly optimized the local computing frequency and unloading ratio of all IoT devices to minimize the average maximum computing delay in multiple time slots.Because the optimization problem is non convex and has time-domain correlation,it is difficult to obtain the optimal solution by using the traditional optimization algorithm.This thesis proposes a DDPG(Deep Deterministic Policy Gradient)method based on reinforcement learning to solve the dynamic optimization problem.The simulation results show the convergence and effectiveness of the proposed DDPG algorithm.Compared with OnlyLocal-Computing algorithm and DQN(Deep Q-Network)algorithm,the average maximum computing delay is reduced by 73.2%and 16.8%respectively.It is fully proved that DDPG algorithm can effectively deal with the time variability of channel and the uncertainty of task arrival,and improve the computing power of the system.As the number of tasks to be processed in the system increases,a single wireless Access Point(AP)may not be able to meet the computing and energy supply requirements.This thesis proposes a task unloading model in the WPT-MEC system with multi APs,and optimizes the system unloading strategy with the goal of improving the system computing power.This thesis jointly optimizes the matching relationship between IoT devices and APs,the local calculation frequency and unloading power of IoT devices,so as to maximize the total number of bits calculated by the system in multiple time slots.Because the optimization variables include both discrete variables and continuous variables,and the objective function has time-domain correlation,it is impossible to find a global optimal solution using traditional optimization methods.Therefore,this thesis proposes a PDQN(Parameterized Deep Q-network)algorithm based on deep reinforcement learning to optimize the discrete and continuous variables in the problem at the same time.The simulation results verify the convergence and effectiveness of the proposed P-DQN algorithm.The proposed strategy is better than benchmark algorithms and close to the theoretical optimal value.The P-DQN algorithm can adapt to the dynamic network environment,and improve the computing power of the system. |