Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes

Posted on:2003-09-21

Degree:Ph.D

Type:Dissertation

University:University of Massachusetts Amherst

Candidate:Duff, Michael O'Gordon

Full Text:PDF

GTID:1468390011983255

Subject:Computer Science

Abstract/Summary:

This dissertation considers a particular aspect of sequential decision making under uncertainty in which, at each stage, a decision-making agent operating in an uncertain world takes an action that elicits a reinforcement signal and causes the state of the world to change. Optimal learning is a pattern of behavior that yields the highest expected total reward over the entire duration of an agent's interaction with its uncertain world. The problem of determining an optimal learning strategy is a sort of meta-problem, with optimality defined with respect to a distribution of environments that the agent is likely to encounter. Given this prior uncertainty over possible environments, the optimal-learning agent must collect and use information in an intelligent way, balancing greedy exploitation of certainty-equivalent world models with exploratory actions aimed at discerning the true state of nature.; My approach to approximating optimal learning strategies retains the full model of the sequential decision process that, in incorporating a Bayesian model for evolving uncertainty about unknown process parameters, takes the form of a Markov decision process defined over a set of “hyperstates” whose cardinality grows exponentially with the planning horizon.; I develop computational procedures that retain the full Bayesian formulation, but sidestep intractability by utilizing techniques from reinforcement learning theory (specifically, Monte-Carlo simulation and the adoption of parameterized function approximators). By pursuing an approach that is grounded in a complete Bayesian world model, I develop algorithms that produce policies that exhibit performance gains over simple heuristics. Moreover, in contrast to many heuristics, the justification or legitimacy of the policies follows directly from the fact that they are clearly motivated by a complete characterization of the underlying decision problem to be solved.; This dissertation's contributions include a reinforcement learning algorithm for estimating Gittins indices for multi-armed bandit problems, a Monte-Carlo gradient-based algorithm for approximating solutions to general problems of optimal learning, a gradient-based scheme for improving optimal learning policies instantiated as finite-state stochastic automata, and an investigation of diffusion processes as analytical models for evolving uncertainty.

Keywords/Search Tags:

Optimal learning, Decision, Process, Uncertainty

Related items

1	Uncertainty-oriented Runtime Service Decision-making Optimization Method
2	Markov Theory Based Planning And Sensing Under Uncertainty
3	Study On Optimal Control In Communication Network Based On Reinforcement Learning
4	Research On Sequential Decision Problems Under Uncertainty
5	Concurrent Markov Decision Processes for Robust Robot Team Learning under Uncertainty
6	Performance guarantee of a sub-optimal policy for a discrete Markov decision process and its application to a robotic surveillance problem
7	Research On Streaming Media In Wireless Networks Based On Markov Decision Process Theory
8	Research And Applications Of Uncertainty In Bayesian Deep Label Distribution Learning
9	Research On Uncertainty Planning Algorithms
10	Study On Technologies Of Manufacturing Feature Extraction And Intelligent Process Decision-Making