| State feedback control and optimal control of dynamic systems are the basic concepts of control theory,which play an important role in theoretical research and practical application.However,the uncertainty of stochastic systems increases the difficulty of studying the above mentioned control problems.This paper mainly discusses the state feedback control and optimal control of several classes of discrete or continuous stochastic systems based on the popular algorithms in the field of artificial intelligence,namely Reinforcement Learning(RL).The main results are as follows:For a class of discrete stochastic systems,i.e.,Probabilistic Boolean Control Networks(PBCNs),an improved Q-learning algorithm,named Weight Speedy Q-learning(WSQL),is proposed in this paper.Based on the WSQL algorithm,a state feedback controller is designed for PBCNs to achieve the stability of PBCNs.Finally,the effectiveness of the proposed algorithm is verified by simulation.Compared with traditional control methods,our method is suitable for the model-free systems.Compared with Q-learning algorithm,the proposed algorithm can converge to an equilibrium point for the PBCNs faster.For a class of continuous stochastic control systems,a reinforcement learning method is proposed to deal with the optimal control of continuous stochastic systems with periodic impulse control and event-based impulse control.We design a Markov decision process to convert the optimal control problem into finding the optimal policy in the RL framework.Based on this,we propose an algorithm based on Deep Deterministic Policy Gradient(DDPG)to find the optimal strategy.Finally,an example is given to demonstrate the effectiveness of the algorithm.Compared with traditional methods,our proposed algorithm does not need to know the dynamic model of the system,but can also be applied to high-dimensional coupled systems.Compared with Q-learning algorithm,the proposed algorithm is closer to the optimal value.At the last part of this paper,for continuous stochastic control systems,we use the reinforcement learning method to study the optimal control with zero-order holder.In this case,we not only need to consider the design of its control parameters,but also design its optimal sampling time according to different sampling methods.We discuss the optimal deterministic sampling and the optimal horizontal cross-sampling,respectively.Simulation results illustrate the effectiveness of our method.Compared with the existing results,our method can be applied not only to the model-free systems,but also to some more complex systems. |