Font Size: a A A

Robust Control Of Stochastic Jump Systems By Temporal Difference Learning

Posted on:2022-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y G ChenFull Text:PDF
GTID:2518306527984339Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In practical engineering applications,due to production needs,working condition changes or emergency situations,the system contains a variety of operating modes,both the continuous evolution of the system states over time,and driven by the discrete time multimodal jump,between the different work patterns change according to certain transition rules,such a system is called stochastic jump systems.Transition probability describes the random transition between the system modes,which becomes a key factor in the study of this kind of system.A complete theoretical system has been formed for the study of stochastic jump systems with completely known transition probability.However,the knowledge of the transition probability does not conform to the engineering practice,which makes the study of stochastic jump system much challengeable.As one of the core ideas of reinforcement learning,Temporal Difference(TD)learning is used to estimate the optimal strategy of Markov Decision Process(MDP)and obtain the maximum benefit.On the one hand,TD learning does not depend on the model probability parameters,and due to the online updating mechanism with eligibility trace,the algorithm has fast convergence.On the other hand,the control problem of stochastic jump systems can be converted into MDP solution problem,so this article will introduce a TD learning robust control scheme,where the value functions converges to the solutions of Riccati equations by observing the mode trajectories.Then,a controller that makes the system closed-loop stability and meet the performance requirements can be obtained.The main research work of this paper is as follows:(1)We introduced the basic concept,the learning process and the algorithm framework of some mainstream TD learning methods.By combining with the of application scenarios of control fields,we aim to show the model-free TD learning control theory,such as the control method with unknown linear system dynamics and for stochastic jump systems with unknown transition probability,which provides theoretical basis for subsequent chapters.(2)TD(?) learning approach is designed for the robust control problem of Markov jump systems.The algorithm uses the idea of value function approximation,and its specific process is divided into two steps:(?)Policy estimation.The eligibility trace and value functions are updated each time a mode jump occurs until the value function of every mode converges,where the eligibility trace depicts the weight relation between the future and current mode observations,which makes the algorithm iteration more flexible and reasonable.(?)Policy improvement.The control policy is updated according to the convergence result of the value function.At the same time,it is proved that the value function and the control policy finally converge to the solution of Riccati equation and the robust controller,respectively.By comparing with the existing methods,the effectiveness and superiority of TD(?)learning robust control method with unknown transition probability are verified.(3)The TD(?) learning algorithm is extended to the control problem of Semi-Markov jump system.Firstly,with known transition probability,the stability and control conditions based on Riccati equation are derived.Furthermore,based on Riccati equation,a TD(?)control algorithm without transition probability information is designed,which can obtain the control sequence dependent on the mode sojourn time.Finally,the theoretical results are applied to the macroeconomic system to illustrate the accuracy of the algorithm learning and the practicability of the control method.
Keywords/Search Tags:Stochastic jump system, temporal difference learning, robust control, Riccati equation, value function
PDF Full Text Request
Related items