Font Size: a A A

Research And Application Of Reinforcement Learning In Intelligent Safety-critical System

Posted on:2022-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:L QianFull Text:PDF
GTID:2518306479493344Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning,especially deep reinforcement learning,which has emerged in recent years,has been successfully applied in many fields.However,at the same time,the lack of safety assurance mechanisms in reinforcement learning and the growing concern and demand for its safety have made it difficult to apply reinforcement learning to intelligent safety critical systems.The environment in which the intelligence is embedded is full of uncertainties,and it is difficult to cope with the various risks in the system by relying solely on a policy learning approach that maximizes long-term returns.In addition,information perturbations in the environment also bring great disturbances to the safety decisions of the intelligences,threatening the safety of the intelligences and the physical environment they are in.Formal methods based on rigorous mathematical theory provide credible theoretical and instrumental support for the safety assurance of safety-critical systems.tools to support the safety of safety-critical systems.However,existing formal methods are not well suited to the complex environments that intelligences have to deal with.In this paper,we address the safety issues of reinforcement learning and the shortcomings of existing methods,and propose a general secure reinforcement learning method using run-time verification to provide safety guarantees for reinforcement learning with the help of formal modeling and verification theories and tools.The main work of this paper includes:(1)Probabilistic interval Computation Tree Logic(Pi CTL)is proposed and its syntactic semantics is formally defined for the description of system properties/constraints for uncertain real-time systems.In addition,a secondary development based on PRISM is implemented to verify the Pi CTL formulation.(2)A safe learning algorithm,called Generic Safe Control with Supervisor(GSCS),is proposed,which organically combines formal verification and reinforcement learning to transform safety constraints into a part of the policy learned by the algorithm.A control monitor based on formal verification monitors the system state in real time,verifies the safety of an intelligent body's decisions,and intervenes in the system's operation if and only if the decision would put the system at risk.In addition,for systems with information perturbations,this paper introduces the concept of safety thresholds,and the monitor adopts a maximum safety policy to minimize the risk as much as possible.(3)A simulation evaluation environment based on the Open AI Gym framework is designed to build an intelligent body based on Double Deep Q-network(DDQN),using the automotive adaptive cruise control system as a model.Using this environment,the performance of the GSCS algorithm is evaluated under different experimental conditions and compared with classical reinforcement learning algorithms to demonstrate the feasibility and effectiveness of the GSCS algorithm.
Keywords/Search Tags:Reinforcement learning, Safety guarantee, Probabilistic interval computation tree Logic, Uncertain hybrid systems, Controller monitor
PDF Full Text Request
Related items