Font Size: a A A

Game,Equilibrium,and Knowledge Transfer In Multi-agent Reinforcement Leanring

Posted on:2016-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J HuFull Text:PDF
GTID:1488304802452784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In artificial intelligence and multi-agent system research domain,multi-agent rein-forcement learning(MARL)is an important learning technique.As the extension of reinforcement learning(RL)in the multi-agent domain,multi-agent reinforcement learning has been successfully applied to many tasks,such as QoS routing in ATM and broadband networks.However,one problematic issue of multi-agent reinforcement learning is that most MARL algorithms cannot scale up due to the expensive cost of equilibrium computation,the complexity of problem models,the lack of exploitation of learnt knowledge,and the curse of dimensionality of state space and agent space.The goal of this thesis is to improve the scalability of multi-agent reinforcement learn-ing by using techniques in game theory,transfer learning,function approximation and so on.The main contribution of this thesis can be summarized as follows.1.One important approach of multi-agent reinforcement learning is equilibrium-based MARL.However,most existing algorithms have unrealistic requirement of sharing value functions.The first contribution of this thesis is a novel and efficient MARL algorithms with unshared value functions.First,in order to reduce computational cost,we define three types of pure strategy profiles as the equilibrium solution concepts,which are pure strategy Nash equilibrium,equilibrium-dominating strategy profile,and non-strict equilibrium-dominating strategy profile.Second,we propose a multi-step negotiation process for finding pure strategy equilibria since value functions are not shared among agents.By putting these together,we propose a novel MARL algorithm called negotiation-based Q-learning(NegoQ).Experiments are first conducted in grid-world games,which are widely used to evaluate MARL algorithms.In these games,NegoQ learns equilibrium policies and runs significantly faster than existing MARL al-gorithms(correlated Q-learning and Nash Q-learning).Surprisingly,we find that NegoQ also performs well in team Markov games such as pursuit games,as com-pared with,team-task-oriented MARL algorithms(such as friend Q-learning and distributed Q-learning).2.The second contribution of this thesis is an equilibrium transfer-based MARL framework which aims at accelerating equilibrium-based MARL.Most exist-ing equilibrium-based MARL algorithms cannot scale due to a large number of computationally expensive equilibrium computations(e.g.,computing Nash equilibria is PPAD-hard)during learning.For the first time,this thesis finds that during the learning process of equilibrium-based MARL,the one-shot games corresponding to each state's successive visits often have the same or similar equilibria(for some states more than 90%games corresponding to successive visits have similar equilibria).Inspired by this observation,this thesis proposes to use equilibrium transfer to accelerate equilibrium-based MARL.The key idea of equilibrium transfer is to reuse previously computed equilibria when each agent has a small incentive to deviate.By introducing transfer loss and transfer con-dition,a novel framework called equilibrium transfer-based MARL is proposed.We prove that although equilibrium transfer brings transfer loss,equilibrium-based MARL algorithms can still converge to an equilibrium policy under cer-tain assumptions.Experimental results in widely used benchmarks(e.g.,grid world game,soccer game,wall game)show that the proposed framework(1)not only significantly accelerates equilibrium-based MARL(up to 96.7%reduction in learning time),but also achieves higher average rewards than algorithms with-out equilibrium transfer,(2)scales significantly better than algorithms without equilibrium transfer when the state/action space grows and the number of agents increases.3.The third contribution of this thesis is three knowledge transfer methods for im-proving the learning performance of MARL in multi-agent systems with sparse interactions.In many multi-agent systems,the interactions between agents are s-parse and exploiting interaction sparseness in multi-agent reinforcement learning(MARL)can improve the learning performance.Also,agents may have already learnt some single-agent knowledge(e.g.,local value function)before the multi-agent learning process.In this thesis,we investigate how such knowledge can be utilized to learn better policies in multi-agent systems with sparse interactions.We adopt game theory-based MARL as the basic learning approach since it can coordinate agents better.We propose three knowledge transfer methods.The first one is value function transfer(VFT),which directly transfers agents' local value functions to the learning algorithm.The second one is selective value func-tion transfer(SVFT),which only transfers the value functions in states where the environmental dynamics change slightly.We define similarity between MDPs to model such changes and propose an approach to evaluate the similarity value.The last method is model transfer-based game abstraction(MTGA),which fur-ther improves the former two mechanisms by abstracting the one-shot game in each state and reducing equilibrium computation.Experimental results in bench-marks show that with the three knowledge transfer methods,all of the tested game theory-based MARL algorithms are drastically improved and also achieve better asymptotic performance than the state-of-the-art algorithm CQ-learning.4.The last contribution of this thesis is the improved version of S VFT and MTGA.The MDP similarity defined for SVFT and MTGA is based on bisimulation met-ric.However,bisimulation metric has high computational and space complexity so that it cannot be used in large-scale problems.To this end,we define a novel concept of MDP similarity based on the distribution of N-step return(NSR)of each state in an MDP.We also propose an algorithm for learning NSR distribu-tion with guarantees on error bound.Corresponding analysis shows that the com-putational and space complexity of learning NSR distribution is much lower than that of computing the bisimulation metric.Based on the novel concept of MD-P similarity,we propose NSR similarity-based selective value function transfer method(SVFT-NSR)and NSR similarity-based game abstraction method(GA-NSR).For solving large-scale state space problems,we introduce linear function approximation(LFA)into equilibrium-based MARL and propose the LFA ver-sion of SVFT-NSR and GA-NSR,in which gaussian process regression is adopt-ed to learn the NSR distribution over the state-feature space.Experiments are conducted in benchmarks and a famous video game called Ms.PacMan to test the tabular and LFA versions of SVFT-NSR and GA-NSR,respectively.The re-sults in benchmarks show that by comparing with SVFT and MTGA,SVFT-NSR and GA-NSR significantly reduce the runtime of the tested learning algorithms with the quality of the policy learnt remaining unchanged.The results in Ms.PacMan show the ghosts trained by the learning algorithm with SVFT-NSR and GA-NSR can effectively stop PacMan scoring.
Keywords/Search Tags:Multi-agent Reinforcement Learning, Game Theory, Equilibrium, Equilibrium Transfer, Value Function Transfer, Game Abstraction
PDF Full Text Request
Related items