Font Size: a A A

Research On Mean-Field Multi-Agent Reinforcement Learning In Large Scale Scenarios

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:T Y WuFull Text:PDF
GTID:2518306776492674Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of deep neural network,deep reinforcement learning algorithms have gradually shined in the fields of operations research,robot control and autonomous driving.However,in scenarios that are closer to the real world,the number of agents often exceeds one,forming a multi-agent environment where competition and cooperation coexist.Currently,the multi-agent reinforcement learning algorithms still have the following challenges:(1)Due to the continuous increase in the number of agents,the state space has expanded sharply.Especially in many-agent scenarios,dimensional disasters are inevitable.(2)Due to the non-stationary nature,the optimal policy is greatly affected by the dynamic changes of other agents.Existing multi-agent reinforcement learning algorithms are only suitable for scenarios with dozens of agents rather than many-agent scenarios.With the exponential growth of the state space and the accumulation of noise by other agents,the research of algorithms in many-agent scenarios needs to explore a new direction.To solve above problems,the Mean-Field multi-agent reinforcement learning with Collaborator-Learning Awareness(MFCLA)and the weighted Mean-Field multi-agent reinforcement learning via Reward Attribution Decomposition(MFRAD)are proposed from the perspective of anticipating neighbor behavior and differentiating neighbor information respectively.The details of proposed work is listed as follows:1.Aiming at the use of dynamically changing neighbor agent information,this paper introduces collaborator-learning awareness learning in many-agent scenarios.The collaborator-awareness learner optimizes its return under the one-step ahead of collaborator learning and shapes the detailed policy update of neighbors,differentiating through the collaborator's learning step to address the dynamic learning to find the optimal solution.2.Since the computational complexity of the regularizing term is sensitive to the number of agents,MFCLA approximates the learning procedure and mean-field influence of others through the proposed mean-field agent.3.Aiming at differentiating the mean-field effect of different neighbors and alleviating the historical information of neighbors in original mean-field aapproximation,MFRAD introduces weighted mean-field form and its weighting coefficients are calculated by attention mechanism,so as to realize the different process of neighborhood information.4.Based on weighted mean-field approximation,MFRAD exploits the reward attribution decomposition to effectively convert the local Q value of the ego agent into the sum of its own effect and the neighbor's weighted mean-field effect.Fully decentralized execution is achieved while the lagging information is eliminated.Experimental results in multiple many-agent scenarios show that the proposed algorithms are better than the benchmark algorithms in terms of convergence speed,stability and scalability,and have a higher win rate in the testing simulation stage,which fully proves the effectiveness of the proposed methods.
Keywords/Search Tags:Multi-Agent Reinforcement Learning, Mean-Field Theory, Opponent Modeling, Reward Attribution Decomposition
PDF Full Text Request
Related items