| In recent years,with the rapid development of mobile wireless communication technology,the continuous growth of the number of mobile terminals,and the continuous emergence of high-speed services,mobile data traffic is growing explosively.The redundant downloading of the same popular content by massive users through the core network consumes a lot of valuable backhaul link resources,aggravates the burden on backhaul links,and has a significant impact on the quality of services.In order to solve the problem,edge caching technology is applied to the edge networks.Edge caching technology takes the advantage of backhaul link resources in off-peak time and proactively puts the potentially requested content into the storage of edge networks.In this way,users are able to acquire the service content quickly from the local storage units.Therefore,edge caching has attracted extensive attention in academia and industry.Considering that the cached content is not an instant service and the cache deployment process can not be completed instantaneously,the edge cache deployment policy needs to predict the content popularity distribution and make the caching decision.However,the mobile edge network is dynamic,and it is hard to immediately obtain the characteristics of content popularity changes.To address the problem,the reinforcement learning(RL)method is adopted in edge caching.Due to the self-learning characteristic,the RL-based edge caching methods formulate cache deployment strategy based on the current network environment state.In this way,the RL-based edge caching methods achieve the effective adaptation of the cache deployment strategy and the dynamic environment.Therefore,the paper is mainly focused on the research of RL-based edge cache deployment policies for dynamic edge networks.The main work and innovations in the paper are as follows:Firstly,to solve the caching problem for a single base station(BS)in dynamic edge networks,the RL-based edge cache deployment policy is proposed.The proposed method takes the advantage of the transferable ability of neural networks(NNs)to solve the cold-start problem of newly initiated neighbor BSs.In specific,the edge caching problem is modeled as a Markov Decision Process(MDP),based on which the caching state,action,and reward are designed.To address the problem,the Asynchr-onous Advantage Actor-critic(A3C)learning method is applied to the edge caching scenarios.The A3C-based edge caching method applies the transfer learning method to transfer its NN model to the newly initiated neighbor BSs,thus effectively alleviating the cold-start problem.The simulation results show that the proposed A3C-based edge caching method effectively improves the training speed of the caching model and solves the cold-start problem of the newly initiated neighbor BSs.Secondly,to solve the caching problem for multi-BSs in dynamic edge networks,the federated reinforcement learning(FRL)based edge cache deployment policy for multi-BSs is proposed.Thereinto,the federated learning(FL)framework is designed to address the problem caused by non-independently and identically distributed(non-i.i.d.)content popularities in the coverage areas of different BSs.In specific,with the purpose of maximizing the total caching benefit of BSs in the federation,the FRL-based edge cache deployment policy for multi-BSs is proposed.In the proposed caching method,the model parameter selection process and the global aggregation process are designed to solve the problem of non-i.i.d.characteristics among edge nodes.Meanwhile,the theoretical bound of the loss function difference is analyzed in the paper,based on which the training times adaption mechanism is proposed to deal with the tradeoff between local training and global aggregation for each edge node in the federation.Numerical simulations have verified that the proposed FRL-based edge caching method outperforms other baseline methods in terms of the caching benefit,the cache hit ratio and the convergence speed. |