Research On Multi-agent Reinforcement Learning Method Based On Stein Variational Gradient Descent

Posted on:2022-06-24

Degree:Master

Type:Thesis

Country:China

Candidate:H Z Chu

Full Text:PDF

GTID:2518306332977529

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The search in multi-agent reinforcement learning is rapidly expanding,its remarkable achievements have been made in many fields such as robotics teams,resource management,distributed control,games,e-commerce,etc.Because single agent is insufficient to make the decision in the complex situations.Generally,the researches on multi-agent reinforcement learning problems have intensive computation and inter-agents are related to each other,although they have made many idealized constraints,it still has some trade-off challenges of exploration and exploit rooted in the reinforcement learning algorithms.This is the main point in our research.We bring the power tool about the Stein variant gradient descent into our algorithm to solve this problem,and we regard the problem about intelligent vehicle scheduling in the Internet of Vehicles as the background,then we propose a multi-agent scheduling framework.We make some experiments to verify the performance for the combination of SVGD and multi-agent framework,they are conducive to handle the trade-off between strategy exploration and exploit.In the era of the Internet of Vehicles,intelligent vehicles based on artificial intelligence provide various services to satisfy the demand from people in daily life.To get some services from computationally intensive applications on the vehicles is still a huge challenge.Edge computing provides abundant resources for these services,it offloads complex tasks from the central base station to edge computing nodes near the vehicles.However,if we want to select a suitable node for offloading tasks,we will know the resource requirements,vehicle movement,and core network conditions in advance,which is not easy to meet real-time service requirements and the quality of user experience(QoE).We divide this demand into two subproblems,they are global node scheduling and autonomous exploration,we use the improved Kuhn-Munkres(KM)algorithm to node scheduling,it can make full use of existing edge computing nodes.At the same time,we propose a new multi-agent scheduling framework based on the network architecture of the DDPG algorithm,it recommands potential computation nodes near the vehicles and encourage vehicles to explore autonomously.However,the work of this multi-agent framework just focuses on the communication between agents.We then introduce the SVGD,it has feature that it can fast fitting the optimal probability distribution,we integrate the policy network parameters and particles in our algorithms,it can quicken the speed of updating policy gradient,this network can make the approximation of the optimal strategy and a diverse set of strategies.The data sets used in the experiments in this article are all derived from the simulation environment,it abstracts the problem that the scheduling task between the intelligent vehicles and RSUs.Our experiment verifies that the joint algorithm can consider the trade-off between QoE and profits in our task objectives in the simulation environment,finally it achieves higher performance.

Keywords/Search Tags:

Stein variant gradient descend, self-imitation, the maximum entropy principle, multi-agent reinforcement learning

PDF Full Text Request

Related items

1	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
2	Regularized Maximum Entropy Imitation Learning Based On Prior Reward Of Trajectory
3	Supervised Reinforcement Learning:methods And Applications
4	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
5	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
6	Multi-agent Coordinated Control Technology Based On Reinforcement Learning
7	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
8	Research On Multi-Agent Pursuit-Evasion Based On Deep Reinforcement Learning
9	Research On Policy Learning Via Imitation
10	Self-Organizing Collaborative Target Search Of Mobile Multi-Agent Based On Reinforcement Learning