Font Size: a A A

Decentralized Multi-agent Cooperative Learning Based On Timesharing Tracking Framework

Posted on:2015-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:B FuFull Text:PDF
GTID:2298330434453087Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Aiming at cooperative multi-agent system(MAS), which is widespread in our daily lives, this thesis studies learning behavioral strategies of MAS by reinforcement learning(RL), and the key of this study is on "curse of dimensionality", credit assignment and convergence proof. The initial idea is from RL theory and its related definition has been discussed based on discrete environment. RL has been applied for cooperative MAS for its good characteristics of self-learning. At the same time, with the progress of the MAS theory research, RL theory has developed to complicated multi-agent reinforcement learning(MARL). However,"curse of dimensionality" problems, low learning efficiency and no guarantee convergence proof have become obstacles for the promotion and application in cooperative MAS.This thesis focuses on "curse of dimensionality", credit assignment and convergence theory in MAS. A new MAS cooperative learning framework named timesharing tracking framework(TTF) has been proposed under the dimension-reduced learning in decentralized RL. On the one hand, the dimension-reduced Q-learning can relieve the "curse of dimensionality". On the other hand, the TTF has guaranteed the convergence in cooperative strategy optimate and the credit assignment has been solved in the end. In addition, we also discussed the simultaneously learning in decentralized MARL.Firstly, based on RL basic theory, combining MAS cooperative learning environment, new dimension-reduced reward and value function are defined. We has proved convergence of the best response learning algorithm in the assumption of non-learning agents’stable strategies, as the dimension-reduced best response learning has been analyzed from the adaptive angle.Secondly, the timesharing tracking framework (TTF) is proposed for the MAS decentralized RL, based on the V value function with dimension reduced. Then we focus on the analysis of the framework of strategy research, dimension reduction and simultaneously learning. We also give the switching principle for the application. Under the assumption of the individual reward known, a new learning algorithm for totally cooperative MAS is proposed by taking the best response learning into the TTF. Simulation results of box-pushing and three-link robots show the high efficiency and effectiveness of the proposed algorithm.Thirdly, to extend the TTF to general case, we introduce a stochastic approximation to get the individual reward for the general cooperative MAS to realize the credit assignment. Combining that with the best response learning way, a general cooperative MARL algorithm is proposed under the TTF. Simulation results of box-pushing and three-link robots show the feasibility and effectiveness the individual reward approximation method.Finally, to deal with the ineffective convergence in fully-cooperative multi-agent learning and realize the adaptation to the system environment, a Two-Stage adaptive learning algorithm is proposed based on decentralized reinforcement learning, which is conducive to the practical application problems of large scale space. Simulation results of three-link robots show the high efficiency of the proposed algorithm.
Keywords/Search Tags:cooperative multi-agent system, decentralized reinforcementlearning, best response learning, timesharing tracking, adaptive learning
PDF Full Text Request
Related items