Font Size: a A A

Multi-agent Coordinated Control Technology Based On Reinforcement Learning

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y S JiangFull Text:PDF
GTID:2518306494986429Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning has become a powerful method to solve the problem of ordered decision making with the help of deep learning.Reinforcement learning refers to the process in which agents constantly interact with the environment to improve their own strategies and maximize their own returns.Multi-agent reinforcement learning,as a branch of reinforcement learning,studies reinforcement learning from the perspective of multiple agents,and has a wide range of applications in many fields,such as traffic flow control,multiplayer confrontation games,automatic driving,etc.,which is becoming the focus of reinforcement learning research.In this paper,the algorithm and application of reinforcement learning in the field of multi-agent are studied.The research contents are as follows:· This paper improves the commonly used algorithms of multi-agent reinforcement learning.In multi-agent reinforcement learning,agents are easily influenced by other agents and the training environment,which leads to the problem that the strategy quality of agents is not high and the convergence speed is slow.this paper propose a novel algorithm,-Maximum Critic Multi-Agent Deep Deterministic Policy Gradient algorithm(-M2DDPG),which leverages a new critic technique called -Maximum Critic to banlance the exploitation and exploration in updating Q-value function.Furthermore,we propose -Maximum Attention Critic Multi-Agent Deep Deterministic Policy Gradient(-MA2DDPG)algorithm in order to improve computation efficiency inspired by the attention idea.We empirically evaluate our algorithms in five kinds of mixed cooperative and communication environments.These experimental results demonstrate that our algorithms significantly accelerates the learning process and outperform existing baseline algorithm MADDPG.· The paper will strengthen the application of algorithms and multi-group epidemic prevention and control,optimize mobile intervention strategies,reduce intervention costs,and control the spread of infectious diseases.This method dynamically divides the population into 5 categories according to the individual's status information,and uses a reinforcement learning algorithm to learn an effective strategy for each group to obtain the largest collective reward,so as to make each group cooperate to learn the optimal strategy,and finally experiment results also show that the algorithm is better than the existing benchmark algorithm.
Keywords/Search Tags:Multi-agent, Reinforcement learning, Maximum Commentator, Epidemic Control
PDF Full Text Request
Related items