Font Size: a A A

Bayesian exploration in Markov decision processes

Posted on:2008-09-11Degree:M.ScType:Thesis
University:McGill University (Canada)Candidate:Castro, Pablo SamuelFull Text:PDF
GTID:2448390005471014Subject:Computer Science
Abstract/Summary:
Markov Decision Processes are a mathematical framework widely used for stochastic optimization and control problems. Reinforcement Learning is a branch of Artificial Intelligence that deals with stochastic environments where the dynamics of the system are unknown. A major issue for learning algorithms is the need to balance the amount of exploration of new experiences with the exploitation of existing knowledge. We present three methods for dealing with this exploration-exploitation tradeoff for Markov Decision Processes. The approach taken is Bayesian, in that we use and maintain a model estimate. The existence of an optimal policy for Bayesian exploration has been shown, but its computation is infeasible. We present three approximations to the optimal policy by the use of statistical sampling.;The first approach uses a combination of Linear Programming and Q-learning. We present empirical results demonstrating its performance. The second approach is an extension of this idea, and we prove theoretical guarantees along with empirical evidence of its performance. Finally, we present an algorithm that adapts itself efficiently to the amount of time granted for computation. This idea is presented as an approximation to an infinite dimensional linear program and we guarantee convergence as well as prove strong duality.
Keywords/Search Tags:Decision, Bayesian, Exploration, Present
Related items