Font Size: a A A

Research On Model-based Deep Reinforcement Learning With Active Exploration

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y G LinFull Text:PDF
GTID:2428330611465676Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deep Reinforcement Learning is a rising research field in modern artificial intelligence,which can be divided into two categories: model-free and model-based.Model-based meth-ods have higher sample efficiency,but lag behind model-free methods in terms of asymptotic performance.Recently,combined with Bayesian neural networks and model predictive con-trol,model-based methods can be comparable to the model-free methods in terms of asymptotic performance in the finite horizon control tasks.However,such methods suffer from poor ex-ploration.The model predictive control algorithm in them has insufficient performance during the online learning stage,and its computational complexity is high,which is not unacceptable during deployment stage.To address those problems,this paper proposes a model-based deep reinforcement learning with active exploration.The contribution of this paper includes three parts:1)To improve the exploration efficiency in existing model-based methods,this paper pro-poses a model-uncertainty-aware active exploration algorithm.Firstly,by maximizing the in-formation gain,a model-uncertainty-aware exploration bonus is derived.Then,the exploration bonus is approximated by ensembles of bootstrap deep neural networks and Wasserstein dis-tance.Finally,we utilize model predictive control for efficient active exploration.The algorithm only needs state data generated by existing model predictive control instead of additional model for exploration.In this way,efficient active exploration can be achieved without increasing model redundancy.2)To improve the performance of model predictive control in active exploration,this pa-per proposes a model predictive control based on importance sampling.This method regards the reinforcement learning as a variational Bayesian inference,and approximates posterior distribu-tion of action sequence with a multivariate Gaussian distribution.To minimize KL divergence between the variational distribution and the posterior,the variational distribution is iteratively updated based on method of moments and importance sampling.Compared with cross-entropy method,this method introduces maximum entropy to encourage exploration and fully utilize the discrepancy among expected return of action sequence samples.Finally,this method can reduce iterations of algorithm and improve the performance.3)To reduce the computational complexity during deployment stage,this paper proposes a model-based finite horizon offline policy optimization.In the policy evaluation step,Q func-tion is learned offline by finite horizon samples collected in dynamics models through the Monte Carlo method.In the policy improvement step,stochastic gradient descent is used to minimize the KL divergence between policy and a Boltzmann distribution derived from Q function.Since the computational complexity of policy network is much lower,this method not only approxi-mates the performance of model predictive control,but also becomes significantly faster.This paper designs and conducts experiments based on four sparse reward or hard explo-ration reinforcement learning environments,which verifies the proposed overall algorithm and its three major parts in terms of sample efficiency,control performance and execution efficiency.
Keywords/Search Tags:Model-Based Reinforcement Learning, Active Exploration, Bayesian Neural Net-works, Model Predictive Control, Variational Inference
PDF Full Text Request
Related items