Font Size: a A A

Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics

Posted on:2012-11-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J JinFull Text:PDF
GTID:1118330371458961Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Imitation learning has become one of the most challenging field in the robotics re-search community. In imitation learning, straightforwardly learning the state-action map-ping fails to take long-term influence into account. Therefore, the researchers prefer to divide the imitation learning into two phases:environment parameters estimation and opti-mal controller solving. According to the requirements of minimal parameter tuning and strong generalization, the reward function's flexibility and sparseness make it the best choice to represent the environment parameters. So for decades imitation learning based on reward function recovering has been one of the most popular issues in this field. Reward recovering, or formally Inverse Reinforcement Learning (IRL), addresses the problem of recovering the reward function underlying Markov decision processes (MDPs) given the dynamics of the system and the behavior of an agent. Unfortunately, existing methods for IRL mostly suffer from following problems:(1) The learning process proceeds in a batch manner and the Reinforcement learning (RL) sub-procedure is time-consuming. (2) The reward function is represented as a discrete function without uncertainty information. (3) Applying the IRL technique into imitation learning poses a demanding assumption of op-timal demonstrations. Towards above problems, this thesis investigates the IRL methods in the sequential and Bayesian frameworks perspectively.Firstly, this thesis investigates the IRL algorithms in the sequential setting based on the max-margin principle and constraints consistence principle perspectively. Based on the max-margin principle, we present an incremental approach to sequential IRL. The IRL problem is modeled as a binary classification problem, and solved in the light of the quasi-addiction sequential learning framework. Based on the constraints consistence principle, we develop a sequential reward learning algorithm based on the relaxation method, which avoids the time-cosumming RL sub-procedure while estimating the rewards. This approach converts the IRL problem to a nonlinear feasibility problem. It translates the demonstration into a set of constraints and projects the estimate of rewards onto them in a sequential man-ner. Furthermore, constraints reduction was introduced to improve real-time performance. We analyze the convergence properties of both two algorithms with detailed proofs.Secondly, to represent the rewards in a more flexible way and provide uncertainty information, we present an approach to recovering both rewards and uncertainty informa-tion in continuous spaces by incorporating Gaussian Processes (GPs). By remodeling the reward function, this approach offers viable solutions to some of major limitations of exist-ing IRL methods, such as lack of confidence information for predictions, and the difficulty of appropriately designating features.Thirdly, as for the usually over-strong optimal demonstrations assumption in IRL-based imitation learning, we give a solution based on Bayesian logistic regression. Due to the complicated form of the posterior distribution, variational Bayes method is adopted to obtain the proper rewards. We resort to the expectation-maximization (EM) algorithm to settle the implicit dependencies between multiple parameters. Our method enjoys the advantage of robustness to non-optimal demonstrations, probabilistic outputs and sparse form of rewards.Finally, we investigate the application of IRL techniques in the behavior evaluation problem of the intelligent agents. We present a novel trajectory evaluation method based on intention analysis. Features are extracted from training sets using principle component analysis (PCA). Then the IRL algorithm, reward reshaping technique and norms of orthog-onal projections are used to define and measure the difference between trajectories. This approach is able to eliminate the ambiguity brought by the inherent ill-posedness of inverse problems and yield reasonable marks in finite steps according to the extent the evaluating trajectory differing from the standard trajectory.
Keywords/Search Tags:Inverse reinforcement learning, Imitation learning, Markov Decision Processes, Sequential learning, Relaxation projection, Gaussian processes, Varia-tional Bayes, Bayesian logistic regression, Trajectory evaluation
PDF Full Text Request
Related items