Font Size: a A A

H_? Tracking Control Method Of Unknown Discrete-Time Linear Systems Based On Reinforcement Q-Learning And Its Application

Posted on:2021-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q ChenFull Text:PDF
GTID:2428330611965433Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Many practical engineering systems in the process of operation have phenomena like mod-el uncertainty,noise interference and parameter transmutation,which bring new problems to the system tracking control.The commonly used solutions are model adaptive control,robust con-trol and so on.Among them,the robust H_?feedback control method has a good effect on the bounded uncertainty of the system,but it still needs to know the system parameter information when designing the controller.For the system with unknown parameters or unmeasurable state,the application of this method has great limitations.Therefore,this paper proposes a series of new on-line reinforcement Q-learning algorithms for the design of H_?tracking control policy of unknown discrete-time linear systems,which solve the problems of model uncertainty and robust control of the controlled system.The main work completed includes:1)For H_?tracking control problem,an augmentation system model composed of the orig-inal controlled system and the command generator system has been constructed,and a discount performance function is introduced to establish the discounted game algebraic Riccati equation.And then,the existence conditions of the game algebraic Riccati equation's unique solution has been proposed,the lower bound of the discount factor has been given to ensure the closed-loop stability of the system,and the stability of the H_?tracking control policy has been proved.2)Based on the above model and theories,the on-policy Q function and recursive Bellman equation have been deduced.On this basis,both the full state feedback and the output feedback on-policy reinforcement Q-learning algorithms have been proposed to acquire the H_?tracking control policies while the system dynamics information is unknown.It has been proved that under the effect of detection noise introduced to satisfy the condition of persistent excitation,the on-policy Q-learning methods do not have any deviation during the parameter estimation of the Q-function Bellman equation,therefore,the algorithm's solution must converge to the unique positive definite solution of the game algebraic Riccati equation,which is the ideal solution.The realization of output feedback is based on the state reconstruction technology,through the input,output and reference signal data to replace the state quantity,avoiding the requirement of full state measurability.3)However,in terms of the on-policy reinforcement Q-learning algorithms proposed above,during the process of iterative learning of the tracking control policies,the disturbance will have to be iteratively updated in the form of the worst disturbance policy,so the on-policy Q-learning algorithms cannot be applied to those controlled systems where the disturbance can not be regulated or disconnected.In order to overcome this defect and expand the application scope of the reinforcement Q-learning algorithm,furthermore,combining the off policy control idea,the novel off-policy Q-learning algorithms based on full state feedback and output feed-back have been proposed.Similarly,the two off-policy Q learning algorithms also have no bias of parameter estimation and convergence properties.In summary,four on-policy or off-policy Q-learning algorithms based on full state feed-back and output feedback are proposed for h-tracking control of unknown discrete-time linear systems.Case simulation results of the single-phase voltage source UPS inverter and the grid connected three-phase photovoltaic power generation inverter,have verified the effectiveness of the proposed Q-learning algorithms.
Keywords/Search Tags:H_?tracking control, reinforcement Q learning, discount factor, output feedback, off-policy control
PDF Full Text Request
Related items