Font Size: a A A

Dyna Learning with Deep Belief Networks

Posted on:2012-09-16Degree:M.ScType:Thesis
University:McGill University (Canada)Candidate:Faulkner, RyanFull Text:PDF
GTID:2468390011459948Subject:Computer Science
Abstract/Summary:
The objective of reinforcement learning is to find "good" actions in an environment where feedback is provided through a numerical reward, and the current state (i.e. sensory input) is assumed to be available at each time step. The notion of "good" is defined as maximizing the expected cumulative returns over time. Sometimes it is useful to construct models of the environment to aid in solving the problem. We investigate Dyna-style reinforcement learning, a powerful approach for problems where not much real data is available. The main idea is to supplement real trajectories with simulated ones sampled from a learned model of the environment. However, in large state spaces, the problem of learning a good generative model of the environment has been open so far. We propose to use deep belief networks to learn an environment model. Deep belief networks [11] are generative models that have been effective in learning the time dependency relationships among complex data. It has been shown that such models can be learned in a reasonable amount of time when they are built using energy models. We present our algorithm for using deep belief networks as a generative model for simulating the environment within the Dyna architecture, along with very promising empirical results.
Keywords/Search Tags:Belief networks, Environment, Model
Related items