With the fast development of artificial intelligence,reinforcement learning,as a machine learning algorithm,can effectively solve many complex decision-making and control problems.It has received more and more attention in the field of control.The application of reinforcement learning in realizing automatic controller design has become research hot-spots.Reinforcement learning method can effectively solve complex control problems such as nonlinear H∞ robust control,event-triggered control,and multi-agent control.The optimal control policy for a specific performance index can be obtained under the condition of unknown.As an intelligent controller design method,reinforcement learning is widely used in the field of industrial process control.The mainstream algorithms of reinforcement learning include value iteration algorithm and policy iteration algorithm.Both algorithms need to first evaluate the current applied policy based on its control process data,and then optimize the policy based on the evaluation results.It is difficult to ensure the stable operation of the system until the algorithm fully grasps the dynamic characteristics of the system.The realization of online self-adaptive control requires that the parameter update and control process be carried out synchronously,and the algorithm itself should have the ability to maintain the stability of the system.This paper studies a reinforcement learning algorithm that can effectively ensure the stability of the initial sampling process in the case of unknown system model and allowable control.THus,the policy is directly optimized and learned online in the sampling process,eliminating the tedious iterative calculation.The model-based adaptive dynamic programming algorithm can maintain the stability of the system while optimizing the control policy,but this method is greatly affected by the model error.The robust adaptive dynamic programming algorithm relies on the prior knowledge related to the perturbation effect,and it is difficult to design.To overcome the above problems,a robust adaptive dynamic programming algorithm with adaptive model mechanism is proposed.The adaptive modeling algorithm learns the dynamic model of the system online and estimates the disturbance effect,and relieves the algorithm’s dependence on the prior knowledge related to the disturbance effect.The planning algorithm uses the disturbance estimation to expand the state vector,and learns the optimal" disturbance feedforward+state feedback" control policy based on a cost function that includes the state estimation error to reduce the influence of disturbances and system parameter uncertainty.The results of nonlinear system simulation show that the algorithm reduces the L2 gain of the closed-loop system compared with the zero-sum game method,and greatly improves the robustness of the control to disturbances and uncertainties.The model-free adaptive control algorithm based on reinforcement learning generally needs an"initial admissible control" to ensure the stability of the system and the boundedness of the cost function in the process of policy exploration.However,the initial admissible control is obtained based on the system model,and the initial admissible control assumption of the model-free reinforcement learning algorithm itself is contradictory,which seriously restricts the practicability of reinforcement learning.In order to solve this problem,a new performance index function is proposed to quantitatively describe the difference between the change rate of the cost function and the exponential decay rate.It is theoretically proved that the optimal policy based on this index has excellent robust performance.In addition,the indicator function integrates the cost function and its derivative with respect to time,which is beneficial to maintain the stability of the online learning process.The deep diving rescue ship simulation verifies that the method can realize online adaptive control,and the control performance is better than the optimal policy obtained by the Q learning method and the zero-sum game method.Finally,the validity of the algorithm is verified by rotating inverted pendulum experiments.The reinforcement learning algorithm based on multi-layer perceptors is difficult to guarantee the security of online interaction.Aiming at this problem,a reinforcement learning framework including multiple policy networks is proposed to solve the online adaptive control problem of nonlinear non-affine systems.Different actor networks compete with each other and cooperate with each other.The competition mechanism selects a control policy that can effectively maintain stability in each control cycle,solves the problems of poor stability and difficult convergence of the control algorithm based on the multi-layer perceptors,and realizes multi-modal policy,improving the robustness of the algorithm.Robustness and generalization ability.The cooperation mechanism integrates the simulation trajectories of different strategies to achieve more accurate evaluation.Compared with mainstream deep reinforcement learning algorithms,the convergence speed is faster and the sample efficiency is greatly improved.As a research on artificial intelligence algorithms,the versatility and significance of algorithms are of great significance.Therefore,the simulation and experiment of this study are carried out for different objects to test the generality of the algorithm.The research objects of the simulation verification include hot research objects in the control field such as deep diving rescue ships and unmanned aerial vehicles;the experimental object is the rotary inverted pendulum control system,and the measured data is used to drive the learning algorithm to verify its effectiveness in practical applications. |