Reinforcement learning for factored Markov decision processes

Posted on:2003-07-30

Degree:Ph.D

Type:Thesis

University:University of Toronto (Canada)

Candidate:Sallans, Brian A

Full Text:PDF

GTID:2468390011981807

Subject:Computer Science

Abstract/Summary:

Learning to act optimally in a complex, dynamic and noisy environment is a hard problem. Various threads of research from reinforcement learning, animal conditioning, operations research, machine learning, statistics and optimal control are beginning to come together to offer solutions to this problem. I present a thesis in which novel algorithms are presented for learning the dynamics, learning the value function, and selecting good actions for Markov decision processes. The problems considered have high-dimensional factored state and action spaces, and are either fully or partially observable. The approach I take is to recognize similarities between the problems being solved in the reinforcement learning and graphical models literature, and to use and combine techniques from the two fields in novel ways.; In particular I present two new algorithms. First, the DBN algorithm learns a compact representation of the core process of a partially observable MDP. Because inference in the DBN is intractable, I use approximate inference to maintain the belief state. A belief state action-value function is learned using reinforcement learning. I show that this DBN algorithm can solve POMDPs with very large state spaces and useful hidden state. Second, the PoE algorithm learns an approximation to value functions over large factored state-action spaces. The algorithm approximates values as (negative) free energies in a product of experts model. The model parameters can be learned efficiently because inference is tractable in a product of experts. I show that good actions can be found even in large factored action spaces by the use of brief Gibbs sampling.; These two new algorithms take techniques from the machine learning community and apply them in new ways to reinforcement learning problems. Simulation results show that these new methods can be used to solve very large problems. The DBN method is used to solve a POMDP with a hidden state space and an observation space of size greater than 2180. The DBN model of the core process has 232 states represented as 32 binary variables. The PoE method is used to find actions in action spaces of size 240 .

Keywords/Search Tags:

Reinforcement learning, Action spaces, Factored, DBN

Related items

1	Actor-Critic Algorithms With Continuous Action Spaces
2	Research On Hierarchical Reinforcement Learning Based On Action Space Partitioning
3	Learning state and action space hierarchies for reinforcement learning using action -dependent partitioning
4	Research On Reinforcement Learning In Continuous Spaces
5	A Study Of Reinforcement Learning Based On Factor Representation
6	Reinforcement learning in biologically-inspired collective robotics: A rough set approach
7	Research On Reinforcement Learning Algorithm Based On Improved Action Decision Method
8	Research On Video-based Human Action Recognition And Prediction
9	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
10	Studies On Generalized Learning Automata And Its Applications