Font Size: a A A

Model-based Safe Reinforcement Learning

Posted on:2022-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:L Y DuFull Text:PDF
GTID:2518306572451344Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As a learning-based planning method,reinforcement learning(RL)is widely studied by researchers because of its ability of planning without prior knowledge of dynamics.Traditional model-free RL methods suffer from high sample complexity,which means that they have to collect a lot of data to learn the optimal policy.By contrast,model-based RL methods firstly learn a model to approximate the dynamics and then use it to plan actions.In this way,the sample efficiency of RL is improved dramatically.Moreover,to apply RL algorithms in some safety-critical domains,it is crucial to ensure the agent only execute safe actions to guarantee the constraints satisfaction.In general,such problems are formulated as constrained planning problems.However,since the learned model is inaccurate,it is difficult to ensure the safety of actions planned by the learned model.To solve this problem,this paper tries to propose a sample efficient planning method that can ensure safety during the learning process based on model-based RL.The main results of this paper are summarized as follows:(1)Quantifying the uncertainty of the predicted state trajectory.Considering a discrete nonlinear system with bounded noise,this paper utilizes the ensembles of probabilistic neural networks to construct the model to learn the dynamics,which are able to capture and quantify the epsimetic uncertainty and aleatoric uncertainty during the modellearing process.Based on the above model,this paper develops a method to quantify the error bound between the state trajectory predicted by the learned model and the real state trajectory under a give action sequence.(2)Model learning based model predictive control.This paper will exploit model predictive control to solve planning problems with safety constraints.By combining the constraints and the uncertainty of the predicted state trajectory,this paper introduces a tube-based constraint and satisfying the tube-baed constraint means the satisfaction of the original constraints of the real state trajectory.Also,the bumpless constraints are introduced,which cannot be ignored in real-world applications.Based on the above mentioned constraints,a mathematical model for model predictive control is formulated.(3)Robust method for solving constrained optimization problems.Based on crossentropy method,this paper proposes a method to solve the complex constrained optimization problems defined by the tube-based constraints,ensemble of probabilistic neural networks and model predictive control.Besides,a method is developed to address the circumstance that the above constrained optimization problems may not have feasible solutions,which improves the robustness of the proposed method.(4)Safe model-based reinforcement learning algorithm and its evaluation.Based on the above research,this paper proposes a safe model-based RL algorithm to plan high performance actions with constraints satisfaction during the learning process.Then,several constrained navigation experiments of unmanned ground vehicles are designed to evaluate the proposed algorithms in terms of safety,control performance and sample efficiency.
Keywords/Search Tags:Reinforcement Learning, Safety Constraints, Model Predicitve Control, Uncertainty, Ensembles of Probabilistic Neural Networks
PDF Full Text Request
Related items