Font Size: a A A

Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains

Posted on:2014-10-16Degree:Ph.DType:Thesis
University:University of Alberta (Canada)Candidate:Gendron-Bellemare, MarcFull Text:PDF
GTID:2458390005986378Subject:Computer Science
Abstract/Summary:
This thesis presents new algorithms for dealing with large scale reinforcement learning problems. Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results.;The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost.;My third contribution is the quad-tree factorization (QTF) algorithm, an information- theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchicaldecomposition screen model, so that image patches are only as large as they need to be.;Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space---the set of all possible game screens---in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.;The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games.
Keywords/Search Tags:Reinforcement learning, Atari, Work
Related items