Font Size: a A A

Research On Actor-Critic Framework Based Mean-Field Control Algorithm

Posted on:2023-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:B JinFull Text:PDF
GTID:2568306830960389Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Multi-agent reinforcement learning is a kind of machine learning method to solve the adaptive control in multi-agent systems.It is widely used in game AI,automatic driving,robot formation,resource scheduling and other complex realistic scenarios,and is an effective method to realize intelligent control in complex systems.However,the current multi-agent reinforcement learning algorithm mainly focus on the environment with limited agent scale.While in the environment of large-scale agents,the dimensional disaster caused by the increase of the number of agents makes many methods unable to effectively deal with the combinatorial optimization of the value function in the huge joint state-action space.To solve this problem,based on the idea of centralized training distributed execution,this paper proposes a multi-agent reinforcement learning algorithm using actor-critic framework to solve the mean field control problems.The details are as follows:First,an actor-critic based mean-field control algorithm(MFC-AC)is proposed to deal with the estimation of the value function in an lifted state-action space,where both state and action are continous probability distributions.The algorithm decomposes the action space into subspaces corresponding to each state,and then uses multiple actor networks to learn sub-policies corresponding to each state,and a central critic network that aggregates the actions of all actors to estimate the global Q-value.Numerical experiments show that the proposed algorithm is effective.Second,in view of the fact that MFC-AC cannot fully explore the decision space and can only learn a single mode strategy by using an external exploration mechanism in complex multiobjective tasks,a conditional entropy mean-field control algorithm(MFC-CEAC)is proposed.The algorithm is based on MFC-AC by adding the conditional entropy of each sub-policy as a regular term to the objective function,maximizing the expected return while making the strategy as random as possible.Then,a network congestion control reinforcement learning environment with multi-objective reward is built to test the algorithm.The numerical results show that the MFCCEAC algorithm can learn the multi-modal strategy in the training process and obtain higher returns than the MFC-AC algorithm,which verifies that MFC-CEAC is a feasible solution to the large-scale agent mean-field control problem.The paper has 21 pictures,11 tables,and 53 references.
Keywords/Search Tags:multi-agent system, multi-agent reinforcement learning, mean-field control, actorcritic framework, regularization
PDF Full Text Request
Related items