Font Size: a A A

Research On Multi-agent Reinforcement Learning Method Based On Stein Variational Gradient Descent

Posted on:2022-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Z ChuFull Text:PDF
GTID:2518306332977529Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The search in multi-agent reinforcement learning is rapidly expanding,its remarkable achievements have been made in many fields such as robotics teams,resource management,distributed control,games,e-commerce,etc.Because single agent is insufficient to make the decision in the complex situations.Generally,the researches on multi-agent reinforcement learning problems have intensive computation and inter-agents are related to each other,although they have made many idealized constraints,it still has some trade-off challenges of exploration and exploit rooted in the reinforcement learning algorithms.This is the main point in our research.We bring the power tool about the Stein variant gradient descent into our algorithm to solve this problem,and we regard the problem about intelligent vehicle scheduling in the Internet of Vehicles as the background,then we propose a multi-agent scheduling framework.We make some experiments to verify the performance for the combination of SVGD and multi-agent framework,they are conducive to handle the trade-off between strategy exploration and exploit.In the era of the Internet of Vehicles,intelligent vehicles based on artificial intelligence provide various services to satisfy the demand from people in daily life.To get some services from computationally intensive applications on the vehicles is still a huge challenge.Edge computing provides abundant resources for these services,it offloads complex tasks from the central base station to edge computing nodes near the vehicles.However,if we want to select a suitable node for offloading tasks,we will know the resource requirements,vehicle movement,and core network conditions in advance,which is not easy to meet real-time service requirements and the quality of user experience(QoE).We divide this demand into two subproblems,they are global node scheduling and autonomous exploration,we use the improved Kuhn-Munkres(KM)algorithm to node scheduling,it can make full use of existing edge computing nodes.At the same time,we propose a new multi-agent scheduling framework based on the network architecture of the DDPG algorithm,it recommands potential computation nodes near the vehicles and encourage vehicles to explore autonomously.However,the work of this multi-agent framework just focuses on the communication between agents.We then introduce the SVGD,it has feature that it can fast fitting the optimal probability distribution,we integrate the policy network parameters and particles in our algorithms,it can quicken the speed of updating policy gradient,this network can make the approximation of the optimal strategy and a diverse set of strategies.The data sets used in the experiments in this article are all derived from the simulation environment,it abstracts the problem that the scheduling task between the intelligent vehicles and RSUs.Our experiment verifies that the joint algorithm can consider the trade-off between QoE and profits in our task objectives in the simulation environment,finally it achieves higher performance.
Keywords/Search Tags:Stein variant gradient descend, self-imitation, the maximum entropy principle, multi-agent reinforcement learning
PDF Full Text Request
Related items