Reinforcement learning,as an important branch of artificial intelligence,has drawn much attention in recent years.With the high-tech firms,such as Google,Baidu,devoted to the research of artificial intelligence,and the plan of developing artificial intelligence written in our government work report this year,the age of AI is coming.Thus,how to apply reinforcement learning to the conventional control problems and make the machines armed with intelligence is a hot topic.Self-balancing vehicle is a typical object which is multivariable,cl ose coupled and absolute unstable,therefore,it is often used to examine whether an algorithm works or not.So far,the concrete applications of reinforcement learning have not been generalized,and the control of self-balancing vehicle is mainly conducted with classical control theories.Conventional reinforcement learning algorithms are concentrated on discrete variables,and it needs to store each value function in the form of a table,while the control of self-balancing vehicle is supposed to be a continuous controlling problem.In this dissertation,we focus on adopting the value function approximation method to deal with continuous state space problem.There,a BP network is used,which has a strong ability of generation and can realize most mapping relations,such as from the state to value function.Thus,a table that stores value functions is replaced by the network and the control with continuous state space is achieved,where just a small set of weight is needed.On the basis of value function approximation,we analyze the structure of actor-critic algorithms,and try to search the policy directly by making policy parameterized,with which,the system will be able to output continuous actions.To do this,we adopt two BP networks to act as the acto r unit and critic unit separately,and the networks are trained based on the TD error in the process of reinforcement learning.Since at each step,there is only one data sample,the weights of networks are adjusted by stochastic gradient descent method.As for the self-balancing vehicle,we set up an accurate mathematical model by using the Lagrange method,which helps to express the states transferring process.Finally,the simulation results show that the algorithm is capable of dealing with continuous state and action space,and achieves a desirable control effects. |