Font Size: a A A

Research On Tracking Control Based On Data-driven Q-learning

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:G Y ZhaoFull Text:PDF
GTID:2370330611466507Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Optimal tracking control has always been a key focus in the control community.It is dedicated to optimizing the predefined performance index function composed of tracking error and control input such that the target output can achieve tracking with less performance cost.Conventional methods on solving the tracking problem generally require partially or even completely knowledge of system dynamics in advance,which restricts its actual application fields.Adaptive dynamic programming(ADP),an intelligent control algorithm with ability of self-learning and optimizing,has become a novel tool to tackle the optimal control problem of systems with unknown dynamics.It is hard to access the internal states of system in practice,which limits the application of ADP algorithm based on the state feedback framework.And at the same time,owing to the presence of probing noise,the ADP scheme developed by the value function approximation method,produces bias of the estimated parameters of optimal controller.Therefore,to deal with the LQT problem of the unknown discrete-time linear system,a Q-learning scheme consists of critic structure and actor structure is proposed.The specific research work can be described as follows.1.Solve the optimal LQT problem with completely unknown system dynamics.At first,the augmented system is constructed by the original controlled system and the reference trajectory.Then,the augmented system state can be expressed by past input,output and reference trajectory sequences.And the Q-function Bellman equation is derived by the reconstructed state subsequently.At the end,the optimal output feedback controller can be learned under the iteration algorithms driven by the measured data.The estimated parameters of the critic-actor structure can be updated online and converge to their optimal value rapidly without the bias created by the excitation noise.2.For the output feedback Q-learning algorithms,the thesis makes an extensive research including the following three aspects,i.e.,on-policy data-driven Q-learning PI,off-policy data-driven Q-learning PI and on-policy data-driven Q-learning VI.The effectiveness of the developed algorithms are verified through MATLAB simulation.3.Considering the unavailable acquirement of the initial data in the learning,the thesis proposes a dynamic output feedback controller based on the internal model principle.It provides the initial data for the output feedback Q-learning algorithms.By self-learning and optimizing,the parameters in the controller would converge to its optimal values.
Keywords/Search Tags:Adaptive dynamic programming, Output feedback, Q-learning, Tracking control, Internal model principle
PDF Full Text Request
Related items