Font Size: a A A

Based On State Prediction Of Collaborative Multi-agent Reinforcement Learning Algorithms

Posted on:2013-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:2248330374488485Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Aiming at cooperative multi-agent system (MAS), which is widespread in our daily lives, this thesis studies the "curse of dimensionality" when scaling reinforcement learning (RL) to MAS to learn the behavior of cooperation. RL has been an important branch of machine learning for its good characteristics of self-learning. To enhance the intelligent and adaptability of cooperation behavior in MAS, many researchers introduce the RL, which is rooted in single agent system, to MAS. However, since RL itself is troubled with the problem of "curse of dimensionality", when introduced to MAS, the space of learning and storage both increases exponentially with the number of agents, which aggravates the "curse of dimensionality" greatly, reduces the learning efficiency significantly and even prevents the convergence in the effective time.This thesis focus on such "curse of dimensionality" in MAS, by introducing the state prediction into RL in MAS, decomposes the learning and storage space reasonably and effectively so as to reduce such space and relieve the "curse of dimensionality", as well as guarantee the convergence of algorithm and improve the decision level of agent.Firstly, from the formal definition of RL, this thesis analyzes the primary cause of "curse of dimensionality", when RL is scaled to MAS, and provide theoretical basics for solving the problem.Secondly, according to the idea of state prediction in MAS, a new multi-agent Q-learning based on joint state value approximation is proposed, named as MQVA. MQVA decomposes the joint state and joint action learning into two learning processes, which are learning individual action and the joint state value approximation. It is not only relieves the "curse of dimensionality" and accelerates the speed of learning, but also guarantee the convergence based on the assumption.Thirdly, to break the assumption and expand the scope of application of MQVA, the thesis adopts a new thinking about MAS, treating the other agents as a part of environment and forming a generalized environment, which is non-stationary. Base on the self-adaptation of RL and the ability of state prediction that making up the non-stationary of the generalized environment, a new framework for multi-agent reinforcement learning that based on the optimal track is presented. Meanwhile, according the character of the new framework, the state prediction and action selection mechanism are designed and the realization is based on the model-based RL.Simulation results show that algorithm based on state prediction presented in this thesis can handle the problem of "curse of dimensionality" and enhance learning efficiency of the agent significantly, so that demonstrate the effectiveness of the proposed algorithm.
Keywords/Search Tags:cooperative multi-agent system, curse ofdimensionality, state prediction mechanism, action selection mechanism
PDF Full Text Request
Related items