| Mobile Edge Computing(MEC)task offloading,an important technology in Io T,can free Wireless Device(WD)from computationally intensive tasks.The offloading of heavy tasks to the surrounding edge computing servers for computation can effectively avoid the task overflow and extra-long waiting problems of wireless devices.Existing research usually models the edge computing task offloading problem as convex optimization and iteratively computes it mathematically,which is computationally intensive and has low intelligence.Reinforcement learning is excellent in both improving the intelligence of decisions and enhancing offloading performance.It enables adaptive,high-level,high-quality decision making in complex environments.In order to further exploit the advantages of reinforcement learning in task offloading decision and improve the decision speed and task offloading efficiency,this paper investigates new reinforcement learning algorithms to solve the edge computing task offloading problem,and the main work and innovations are as follows:First,the offloading problem of edge computing tasks is modeled as a binary problem.Separate mathematical models are designed for the task to stay in the local computation and offload to the edge server computation.A suitable wireless fading channel dataset is used as the input to the algorithm,and a suitable algorithmic reward calculation is found based on the mathematical model.The foundation for designing the algorithm later is prepared.Second,a Difference Constraint Proximal Optimization Policy Optimization(DCPO)is proposed,which aims to make the network training process more stable by limiting the differences between structural identities.The joint optimization of low-latency and energy-efficient communication can be achieved while improving the data utilization.The algorithm is somewhat adaptive and can effectively reduce the manual intervention of the model.According to the simulation results,the computational rate of DCPO is improved by about 12% compared to the algorithm in the comparison test.Third,a new distributed Asynchronous Update Reinforcement Learning-based Offloading(ARLO)algorithm is proposed.This method is a distributed learning method consisting of five sub-networks and a common network.Each sub-network has the same structure,and they interact with their own environment to learn and update the public network together.The parameters of the common network are pulled backwards at regular intervals.Each sub-network is equipped with an experience pool,which minimizes correlation between data and is also very effective in avoiding models that fall into local optima.When the model is trained,five threads can run synchronously and can handle tasks from different users simultaneously.The experimental data show that the algorithm proposed in this paper performs well in the offloading process of edge computing tasks.Not only does it save time over traditional mathematical computations,it also improves the computational rate by about 20% over ordinary reinforcement learning algorithms,and by about5% over DCPO algorithms. |