| With the advent of the 5G Internet of Things era,mobile edge computing provides an effective solution for latency-sensitive services of mobile users.However,the resource allocation problem in mobile edge computing still faces many challenges.In a practical scenario,a large number of users need to be served,and each user has multiple decision variables,which will make the decision space very large.The complexity of the scenario leads to the problem of resource allocation not being a convex optimization problem,which cannot be solved by optimization methods.Moreover,the resource allocation problem is a time-series decision-making problem,which requires real-time decision at every time slot.,and the optimization goal should comprehensively consider the longterm optimal performance such as latency and energy consumption,Therefore,existing methods are difficult to effectively solve the resource allocation problem in mobile edge computing.Taking the resource allocation problem in mobile edge computing as the research point and the design of an efficient,high-performance and general resource allocation algorithm as the pointcut,this paper proposes a deep reinforcement learning algorithm framework based on Monte Carlo Tree Search(MCTS)and deep neural network(DNN).Specifically,this paper proposes an improved intelligent Task Offloading Algorithm(iTOA)based on long and short-term memory(LSTM)networks,an intelligent resource allocation framework for collaborative mobile edge computing based on a Multi-task Deep Neural Network(MT-DNN)(iRAF),and intelligent service migration algorithm(iSMA)based on hidden spatial reasoning.The main innovations of this paper include:1.The MCTS algorithm maps the high-dimensional decision space to each layer of the Monte Carlo tree,which effectively divides the decision space.The MCTS makes a strategy search aiming to reward higher space through steps such as layer-by-layer search,strategy simulation,reward backtracking,and selective expansion,which greatly reduces the scope of the search space and improves search efficiency.Since DNN can predict the prior probability distribution of decision variables,DNN is used to guide the search process of MCTS.With DNN,the search for MCTS is more tendentious,and the decision space is further greatly reduced.Therefore,the search speed and accuracy have been greatly improved.The biggest advantage of DNN is that it can predict unseen states,thereby improving the generality of the algorithm.2.According to the characteristics of UAV channels,an LSTM channel prediction module is proposed to improve the search accuracy of MCTS.Compared with the traditional algorithm,the performance of the proposed iTOA is improved by 60%.Besides,to adapt to the characteristics of a resource allocation action with multiple subactions,this paper proposes an improved MT-DNN,which is used to output the prior probability distribution of multiple sub-actions at once.Since the MT-DNN can obtain a priori probability distribution of multiple sub-actions with dependencies,it can improve the performance of joint decision-making.The iRAF proposed in this paper improves the performance of the traditional algorithms by 59.27%.3.For the service migration problem,this paper proposes to model the problem with a partially observable Markov decision process.This method reduces the dimension of the decision space exponentially and allows distributed decisions to take global information into account.This paper proposes to replace the MCTS algorithm with a cross-entropy programming algorithm for continuous search space,which can make up for the performance loss of discrete space search in the MCTS.The cross-entropy planning algorithm can also be accelerated with GPU,which greatly improves the planning efficiency.For the guidance of the search process,an environment prediction model is proposed based on hidden spatial reasoning.The model can completely simulate the state transition process of the environment in the hidden space,so the cross-entropy programming algorithm can combine with the future state to make the optimal decision.Compared with the Deep Q-learning Network(DQN)algorithm,the iSMA algorithm has a 58.1% performance improvement in delay performance optimization. |