Font Size: a A A

Research On Multi-agent Cooperation Strategy Based On Reinforcement Learning

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:C LiangFull Text:PDF
GTID:2428330602979278Subject:Control engineering
Abstract/Summary:PDF Full Text Request
How to apply the intensive learning method to accomplish specific tasks in a multi-agent environment has always been a difficult point in the field of reinforcement learning.Effective communication and coordination between multiple agents is an important means to move toward general artificial intelligence.At present,many traditional reinforcement learning algorithms can realize single agent learning in a simple environment.However,in a multi-agent environment,due to the complexity and dynamic nature of the environment,the learning process encounters great difficulties,and dimensional explosions occur.Target rewards are difficult to determine,and the algorithm is unstable and difficult to converge.This paper introduces a multi-agent reinforcement learning method based on improved DDPG.By improving the DDPG model combined with bidirectional cyclic neural network and comparing with other algorithms,the improved algorithm is in terms of convergence speed and task completion.There is a significant improvement.The main research contents of this topic are as follows:(1)Summarize the research status of traditional reinforcement learning(single agent reinforcement learning)and multi-agent reinforcement learning algorithm at home and abroad,and introduce the model structure of classical algorithm and the basic knowledge of game theory applied in multi-agent environment.And propose a multi-agent reinforcement learning algorithm based on multi-intelligent communication;(2)For two more advanced communication-based multi-agent reinforcement learning algorithms MADDPG and BiCNet proposed in recent years,according to the existing experimental environment,redefine environmental rewards and tasks,respectively,in different environments Experiments are carried out,and the advantages and their limitations are analyzed respectively through the experimental results.Combined with the optimization methods of these two algorithms,an improved algorithm based on DDPG algorithm is proposed.(3)In order to solve the problem that the performance of the first two algorithms is low and difficult to adapt to different environments,the Mi-DDPG(Mixed Deep Deterministic Policy Gradient)algorithm is first added to the two-way circular network in the Actor network as the information layer of the homogeneous agent.The heterogeneous agent information is added to the Critic network to learn the multi-agent collaboration strategy.In addition,in order to alleviate the training pressure,the centralized training and distributed execution framework are adopted,and the Q function in the Critic network is modularized.This not only improves the performance and execution efficiency of the algorithm,but also ensures the generalization ability of the algorithm in different environments.(4)In the experiment,the Mi-DDPG algorithm is compared with other algorithms in different scenarios.Mi-DDPG has obvious improvement in convergence speed and task completion degree,and it has potential value in real world application.
Keywords/Search Tags:reinforcement learning, deep learning, multi-agent, RNN, DDPG, Actor-Critic
PDF Full Text Request
Related items