Fast, Scalable Algorithms for Reinforcement Learning in High Dimensional Domains

Posted on:2014-10-16

Degree:Ph.D

Type:Thesis

University:University of Alberta (Canada)

Candidate:Gendron-Bellemare, Marc

Full Text:PDF

GTID:2458390005986378

Subject:Computer Science

Abstract/Summary:

This thesis presents new algorithms for dealing with large scale reinforcement learning problems. Central to this work is the Atari 2600 platform, which acts as both a rich evaluation framework and a source of challenges for existing reinforcement learning methods. Three contributions are presented; common to all three is the idea of leveraging the highly structured nature of Atari 2600 games in order to achieve meaningful results.;The second part investigates the use of hashing in linear value function approximation. My work provides a new, theoretically sound hashing method for linear value function approximation based on prior work on sketches. Empirically, the new hashing method offers a significant performance advantage compared to traditional hashing, at a minuscule computational cost.;My third contribution is the quad-tree factorization (QTF) algorithm, an information- theoretic approach to the problem of predicting future Atari 2600 screens. The algorithm relies on the natural idea that future screens can be efficiently factored into image patches. QTF goes a step further by providing a hierarchicaldecomposition screen model, so that image patches are only as large as they need to be.;Together, the contributions in this thesis are motivated by the need to efficiently handle the Atari 2600's large observation space---the set of all possible game screens---in arbitrary Atari 2600 games. This work provides evidence that general, principled approximations can be devised to allow us to tackle the reinforcement learning problem within complex, natural domains.;The first part of this work formally introduces the notion of contingency awareness: the recognition that parts of an agent's observation are under its control, while others are solely determined by its environment. Together with this formalization, I provide empirical results showing that contingency awareness can be used to generate useful features for value-based reinforcement learning in Atari 2600 games.

Keywords/Search Tags:

Reinforcement learning, Atari, Work

Related items

1	Supervised Reinforcement Learning:methods And Applications
2	Making reinforcement learning work on real robots
3	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
4	Reinforcement Learning Based On Spectral Graph Theory
5	Study Of Multi-agent Learning Problem Based On Reinforcement Learning
6	Research On Sample-efficient Reinforcement Learning Methods
7	Research And Implementation Of Reinforcement Learning Method About Transport Strategy Between Carrier-based Aircraft Station
8	Research On Reinforcement Learning Based On Hidden Space Modeling
9	Research On Security Deep Reinforcement Learning Based On Experiences
10	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning