Font Size: a A A

Adversarial Inverse Reinforcement Learning with Changing Dynamic

Posted on:2018-07-24Degree:M.SType:Thesis
University:University of Illinois at ChicagoCandidate:Tirinzoni, AndreaFull Text:PDF
GTID:2478390020956795Subject:Artificial Intelligence
Abstract/Summary:
Most work on inverse reinforcement learning, the problem of recovering the unknown reward function being optimized by a decision-making agent, has focused on cases where optimal demonstrations are provided under single dynamics. We analyze the more general settings where the learner has access to sub-optimal demonstrations under several different dynamics.;We argue that several problems, such as learning under covariate shift or risk aversion, can be modeled in this way.;We propose an adversarial formulation where the learner tries to imitate a constrained, worst-case estimate of the demonstrator's control policy. We adopt the method of Lagrange multipliers to remove the constraints and produce a convex optimization problem.;We prove that the constraints imposed by the multiple dynamics lead to an NP-Hard optimization subproblem, the computation of a deterministic policy maximizing the total expected reward from several different Markov decision processes. We propose a tractable approximation by reducing the latter to the optimal control of partially observable Markov decision processes.;We show the performance of our algorithm on two synthetic data problems. In the first one, we try to recover the reward function of a randomly generated Markov decision process, while in the second we try to rationalize a robot navigating through a grid and demonstrating goal-directed behavior.
Keywords/Search Tags:Decision
Related items