On the performance of online concurrent reinforcement learners

Posted on:2007-07-10

Degree:Ph.D

Type:Dissertation

University:Tulane University

Candidate:Banerjee, Bikramjit

Full Text:PDF

GTID:1452390005481681

Subject:Engineering

Abstract/Summary:

Multiagent Reinforcement Learning (MARL) is significantly complicated relative to single agent Reinforcement Learning (RL) by the fact that multiple learners render each other's environments non-stationary. While RL focuses on learning a fixed target function like most of machine learning, MARL deals with learning a moving target function. In contrast to classical Reinforcement Learning, MARL deals with an extra level of uncertainty in the form of the behaviors of the other learners in the domain. Existing learning methods provide guarantees about the performance of the learners only in the limit since a learner approaches its desired behavior asymptotically. There is little insight into how well or how poorly an on-line learner can perform while it is learning. This is the core problem studied in this dissertation, resulting in the following contributions. First, the dissertation analyzes some existing MARL algorithms and their online performances when pitted against each other in adversarial situations. This analysis yields a novel characteristic of many MARL algorithms that we call reactivity [1], that explains the observed behaviors of the algorithms. Optimizing this characteristic could produce safe learners, i.e., those that can guarantee good payoffs in competitive situations but the optimization involves tradeoff with the agent's noise sensitivity.; Second, it sets up a novel mix of goals for a new MARL algorithm that will achieve some basic learning objectives without knowing the type of the other agents, such as (1) learn the best response behavior when other agents in the domain exhibit (eventually) stationary behavior and (2) jointly converge on a mutual equilibrium behavior in case the other agents are using the same learning algorithm, and will also ensure that in case the other agents are neither of the above types, it will achieve some minimum average payoff that is "good" in some sense.; Third, the dissertation extends the class of no regret algorithms to yield a class of algorithms that are shown [3] to achieve (1) close to best response payoffs against (eventually) stationary opponents, (2) close to the best possible asymptotic payoffs against converging opponents, and (3) close to at least the minimax payoffs against any other opponents, in polynomial time with high likelihood.; Fourth, the dissertation explores the cost of learning when the opponents are also adaptive.; Lastly, the dissertation validates all its novel techniques and algorithms empirically comparing them with existing techniques in simulations. (Abstract shortened by UMI.)...

Keywords/Search Tags:

MARL, Reinforcement, Learners, Algorithms, Dissertation

Related items

1	The role of marl components and ettringite on the stability of stabilized marl
2	Unfinished journeys: Elder learners in an assisted living facility
3	Study On Mechanism And Early Warning System Of Water Inrush Disaster In Marl Tunnel
4	Distribution system planning: A set of new formulations and hybrid algorithms
5	Drilled shaft skin resistance design in the Cooper Marl
6	Adaptive Traffic Signal Control With Multi-agent Reinforcement Learning
7	Research On Algorithms For UAVs Pathplanning In Hazardous Environment
8	Research And Application Of Multi-Agent Reinforcement Learning In Traffic Signal Control
9	WMGIRL: Research On Multi-Agent Inverse Reinforcement Learning Algorithms And Their Applications In Transportation
10	State estimation and PEV scheduling in smart grids