Research On Actor-Critic Framework Based Mean-Field Control Algorithm

Posted on:2023-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:B Jin

Full Text:PDF

GTID:2568306830960389

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Multi-agent reinforcement learning is a kind of machine learning method to solve the adaptive control in multi-agent systems.It is widely used in game AI,automatic driving,robot formation,resource scheduling and other complex realistic scenarios,and is an effective method to realize intelligent control in complex systems.However,the current multi-agent reinforcement learning algorithm mainly focus on the environment with limited agent scale.While in the environment of large-scale agents,the dimensional disaster caused by the increase of the number of agents makes many methods unable to effectively deal with the combinatorial optimization of the value function in the huge joint state-action space.To solve this problem,based on the idea of centralized training distributed execution,this paper proposes a multi-agent reinforcement learning algorithm using actor-critic framework to solve the mean field control problems.The details are as follows:First,an actor-critic based mean-field control algorithm(MFC-AC)is proposed to deal with the estimation of the value function in an lifted state-action space,where both state and action are continous probability distributions.The algorithm decomposes the action space into subspaces corresponding to each state,and then uses multiple actor networks to learn sub-policies corresponding to each state,and a central critic network that aggregates the actions of all actors to estimate the global Q-value.Numerical experiments show that the proposed algorithm is effective.Second,in view of the fact that MFC-AC cannot fully explore the decision space and can only learn a single mode strategy by using an external exploration mechanism in complex multiobjective tasks,a conditional entropy mean-field control algorithm(MFC-CEAC)is proposed.The algorithm is based on MFC-AC by adding the conditional entropy of each sub-policy as a regular term to the objective function,maximizing the expected return while making the strategy as random as possible.Then,a network congestion control reinforcement learning environment with multi-objective reward is built to test the algorithm.The numerical results show that the MFCCEAC algorithm can learn the multi-modal strategy in the training process and obtain higher returns than the MFC-AC algorithm,which verifies that MFC-CEAC is a feasible solution to the large-scale agent mean-field control problem.The paper has 21 pictures,11 tables,and 53 references.

Keywords/Search Tags:

multi-agent system, multi-agent reinforcement learning, mean-field control, actorcritic framework, regularization

PDF Full Text Request

Related items

1	Research On Deep Reinforcement Learning Technology For Multi-agent Collaboration
2	Research On Key Technologies Of Multi-agent Cooperation Problems Based On Reinforcement Learning
3	Research On Mean-Field Multi-Agent Reinforcement Learning In Large Scale Scenarios
4	Research On The Key Technology Of Multi-agent Collaborative Algorithm Based On Deep Reinforcement Learning
5	Research On Multi-agent Cooperation Method Based On Deep Reinforcement Learning
6	The Research On Reinforcement Learning Based On Cooperative Multi-agent
7	Research And Application Of Reinforcement Learning In Multi-agent Collaboration
8	Research On AGV Path Planning Based On Cooperative Multi-agent Reinforcement Learnin
9	Research On Key Technologies Of Reinforcement Learning For Cooperative Multi-Agent System
10	Multi Agent Path Planning And Formation Based On Hierarchical Reinforcement Learning