Research And Application Of Reinforcement Learning In Intelligent Safety-critical System

Posted on:2022-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:L Qian

Full Text:PDF

GTID:2518306479493344

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning,especially deep reinforcement learning,which has emerged in recent years,has been successfully applied in many fields.However,at the same time,the lack of safety assurance mechanisms in reinforcement learning and the growing concern and demand for its safety have made it difficult to apply reinforcement learning to intelligent safety critical systems.The environment in which the intelligence is embedded is full of uncertainties,and it is difficult to cope with the various risks in the system by relying solely on a policy learning approach that maximizes long-term returns.In addition,information perturbations in the environment also bring great disturbances to the safety decisions of the intelligences,threatening the safety of the intelligences and the physical environment they are in.Formal methods based on rigorous mathematical theory provide credible theoretical and instrumental support for the safety assurance of safety-critical systems.tools to support the safety of safety-critical systems.However,existing formal methods are not well suited to the complex environments that intelligences have to deal with.In this paper,we address the safety issues of reinforcement learning and the shortcomings of existing methods,and propose a general secure reinforcement learning method using run-time verification to provide safety guarantees for reinforcement learning with the help of formal modeling and verification theories and tools.The main work of this paper includes:(1)Probabilistic interval Computation Tree Logic(Pi CTL)is proposed and its syntactic semantics is formally defined for the description of system properties/constraints for uncertain real-time systems.In addition,a secondary development based on PRISM is implemented to verify the Pi CTL formulation.(2)A safe learning algorithm,called Generic Safe Control with Supervisor(GSCS),is proposed,which organically combines formal verification and reinforcement learning to transform safety constraints into a part of the policy learned by the algorithm.A control monitor based on formal verification monitors the system state in real time,verifies the safety of an intelligent body's decisions,and intervenes in the system's operation if and only if the decision would put the system at risk.In addition,for systems with information perturbations,this paper introduces the concept of safety thresholds,and the monitor adopts a maximum safety policy to minimize the risk as much as possible.(3)A simulation evaluation environment based on the Open AI Gym framework is designed to build an intelligent body based on Double Deep Q-network(DDQN),using the automotive adaptive cruise control system as a model.Using this environment,the performance of the GSCS algorithm is evaluated under different experimental conditions and compared with classical reinforcement learning algorithms to demonstrate the feasibility and effectiveness of the GSCS algorithm.

Keywords/Search Tags:

Reinforcement learning, Safety guarantee, Probabilistic interval computation tree Logic, Uncertain hybrid systems, Controller monitor

PDF Full Text Request

Related items

1	Research On The Logical Expression Ability Of Generalized Likelihood Computing Trees
2	Research And Application Of Dependable Reinforcement Learning Based On Timed Differential Dynamic Logic
3	Research On Probabilistic Verification Of SysML Activity Diagram For Safety-critical Embedded System
4	Model-based Safe Reinforcement Learning
5	The Design And Implementation Of Computer-Aided Probabilistic Safety Analysis Software
6	The Application Of Computational Intelligence In Control, Optimization, And Decision
7	Stability Analysis And Parametric Controller Design Of Uncertain Distributed Parameter Systems
8	An Improved Probabilistic Database Model And Its Probabilisticn Earest Neighbors Query Research
9	Safety Controller Synthesis Of Interval Type-2 T-S Model-based Networked Systems
10	Studies On The Flexible Interval-logic And Its Reasoning