Font Size: a A A

Reinforcement learning for factored Markov decision processes

Posted on:2003-07-30Degree:Ph.DType:Thesis
University:University of Toronto (Canada)Candidate:Sallans, Brian AFull Text:PDF
GTID:2468390011981807Subject:Computer Science
Abstract/Summary:
Learning to act optimally in a complex, dynamic and noisy environment is a hard problem. Various threads of research from reinforcement learning, animal conditioning, operations research, machine learning, statistics and optimal control are beginning to come together to offer solutions to this problem. I present a thesis in which novel algorithms are presented for learning the dynamics, learning the value function, and selecting good actions for Markov decision processes. The problems considered have high-dimensional factored state and action spaces, and are either fully or partially observable. The approach I take is to recognize similarities between the problems being solved in the reinforcement learning and graphical models literature, and to use and combine techniques from the two fields in novel ways.; In particular I present two new algorithms. First, the DBN algorithm learns a compact representation of the core process of a partially observable MDP. Because inference in the DBN is intractable, I use approximate inference to maintain the belief state. A belief state action-value function is learned using reinforcement learning. I show that this DBN algorithm can solve POMDPs with very large state spaces and useful hidden state. Second, the PoE algorithm learns an approximation to value functions over large factored state-action spaces. The algorithm approximates values as (negative) free energies in a product of experts model. The model parameters can be learned efficiently because inference is tractable in a product of experts. I show that good actions can be found even in large factored action spaces by the use of brief Gibbs sampling.; These two new algorithms take techniques from the machine learning community and apply them in new ways to reinforcement learning problems. Simulation results show that these new methods can be used to solve very large problems. The DBN method is used to solve a POMDP with a hidden state space and an observation space of size greater than 2180. The DBN model of the core process has 232 states represented as 32 binary variables. The PoE method is used to find actions in action spaces of size 240 .
Keywords/Search Tags:Reinforcement learning, Action spaces, Factored, DBN
Related items