Font Size: a A A

Learning control policies from demonstration in continuous sensory and action space

Posted on:2016-03-03Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:McLeod, Adam MFull Text:PDF
GTID:1472390017477079Subject:Electrical engineering
Abstract/Summary:
Learning to control a bipedal robot is a difficult control task. Learning to do so rapidly or automatically is even more difficult. This dissertation discusses techniques to take advantage of a person's innate ability to solve this kind of problem and leverage it to train a software agent. This kind of approach is called 'Learning from Demonstration', and here it is applied to a control problem that comprises the dominant underlying physical principle of legged locomotion; the inverted pendulum. Human volunteers are tasked with mastering a inverted-pendulum game, and logs of their actions are used to guide a learning model similar in structure to a reinforcement learner. When given access to the human-generated examples of game mastery, the learning model is able to find a control solution as good as or better than the human demonstrations, while a similar reinforcement learning model without human-generated data to guide it is unable to find a solution.;Encoding control policy is a related but distinct problem from policy learning. In addition to the Learning from Demonstration (LfD) experiments described above, unsupervised feature learning for policy representation is also investigated. Unsupervised feature learning, a form of dimensionality reduction, can be a vital aid to general learning tasks. High dimensionality can make analysis via dimension-reducing algorithms like Sparsenet or FastICA infeasible due to memory constraints. These techniques require a fixed quantization of the data space, which implies a certain dependence of the algorithm's performance on the size of the data space (measured in dimensions), and the quantization method used. If the data space is small to begin with, good performance can be expected. Poor performance can be expected if the space is large unless a vast amount of memory is used to finely (but naively) quantize the space, or a priori knowledge of a method that allows efficient quantization is known. However, if there exist underlying patterns within the dataset that sparsely populate the total volume of the data space, then it is possible to extract sparse codes without the need to observe whole codes, without enumerating large vectors within a learning algorithm, and without special knowledge of the underlying structure of the data being analyzed. Here, we introduce an algorithm capable of finding an overcomplete set of sparse codes that exist within high-dimensional data spaces. By using the Sparsenet training paradigm to train a set of computational models for regression (e.g. neural networks) as opposed to updating static vectors, we can learn a sparse set of codes from partial training examples. The technique also decouples the memory footprint of the system from the dimensionality of the data being analyzed, allowing the model to encode high-dimensional patterns.
Keywords/Search Tags:Space, Data, Model
Related items