Font Size: a A A

Data-driven Q-Learning Stabilization Control

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:C Z YangFull Text:PDF
GTID:2428330599976313Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Modern control theory is subject to the complexity of the model of the system and the feasibility of the model hypothesis,it doesn't stabilize the increasingly complex controlled systems in practical production applications well.As the system gets more and more complex,due to the development of computer science,these complex systems have generated a large amount of data in production operations.These data contain more system information than the system model.Using these measurement data,skipping the modeling process,that is,using Data-Driven Control to design a controller that satisfy performance requirements for complex systems is very realistic.The approximate Q-learning(AQL),as a typical reinforcement learning method,has attracted extensive attention in the past few years because its outstanding ability to solve the nonlinear optimal control problem when the knowledge/model of the plant is unavailable.However,because of function approximation errors,the AQL algorithms can just give a near-optimal solution of the nonlinear optimal control problem.Hence,the optimality error bound analysis is an important issue.This issue is not totally solved in published literatures.In this paper,the value iteration AQL is used to solve the model-free/data-driven optimal stabilization control problem and a new optimality error bound analysis framework is proposed.The main research contents are as follows.Firstly,for convenience and clearness of analyzing the optimality error bound of nonlinear optimal control system,the concept of Q-learning operator is proposed and the nature of learning Q-operator is explained.the Q-learning operator is well defined based in the estimate of the domain of attraction(DOA)for closed-loops system.Secondly,Gaussian process regression is a Bayesian modeling process defined on the distribution of functions,and the quantitative analysis result of the estimation error bound for the optimal Q-function is obtained by selecting the function estimator as Gaussian processes regression(GPR),which can provide the standard deviations of the predictions as the function approximation error bound.Finally,error bound of Q-function and the quantitative analysis result of the optimality error bound,which is the error bound between the optimal cost and the actual cost of the AQL closed-loop,is given.An optimal controller is given as well.In this paper,the simulation experiments of linear controlled objects and nonlinear objects are carried out respectively.Experimental results show that based on the approximate Q-learning algorithm,through the optimal error bound analysis framework proposed in this paper,an suboptimal controller based on data-driven control of the controlled object can be obtained,and the optimal error bound is given.As shown in the main result of this paper,the optimality error bound should be zero if the number of sufficient data used to estimate Q-functions and the number of iterations were infinite.
Keywords/Search Tags:Q-learning, Data-based control, Reinforcement learning, Domain of attraction, Asymptotic stabilization
PDF Full Text Request
Related items