Reinforcement Learning Control Algorithm For Two-wheeled Balancing Vehicle

Posted on:2019-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:P H Xia

Full Text:PDF

GTID:2428330566998452

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning,as an important branch of artificial intelligence,has drawn much attention in recent years.With the high-tech firms,such as Google,Baidu,devoted to the research of artificial intelligence,and the plan of developing artificial intelligence written in our government work report this year,the age of AI is coming.Thus,how to apply reinforcement learning to the conventional control problems and make the machines armed with intelligence is a hot topic.Self-balancing vehicle is a typical object which is multivariable,cl ose coupled and absolute unstable,therefore,it is often used to examine whether an algorithm works or not.So far,the concrete applications of reinforcement learning have not been generalized,and the control of self-balancing vehicle is mainly conducted with classical control theories.Conventional reinforcement learning algorithms are concentrated on discrete variables,and it needs to store each value function in the form of a table,while the control of self-balancing vehicle is supposed to be a continuous controlling problem.In this dissertation,we focus on adopting the value function approximation method to deal with continuous state space problem.There,a BP network is used,which has a strong ability of generation and can realize most mapping relations,such as from the state to value function.Thus,a table that stores value functions is replaced by the network and the control with continuous state space is achieved,where just a small set of weight is needed.On the basis of value function approximation,we analyze the structure of actor-critic algorithms,and try to search the policy directly by making policy parameterized,with which,the system will be able to output continuous actions.To do this,we adopt two BP networks to act as the acto r unit and critic unit separately,and the networks are trained based on the TD error in the process of reinforcement learning.Since at each step,there is only one data sample,the weights of networks are adjusted by stochastic gradient descent method.As for the self-balancing vehicle,we set up an accurate mathematical model by using the Lagrange method,which helps to express the states transferring process.Finally,the simulation results show that the algorithm is capable of dealing with continuous state and action space,and achieves a desirable control effects.

Keywords/Search Tags:

reinforcement learning, self-balancing vehicle, continuous space, actor-critic, artificial neural network

PDF Full Text Request

Related items

1	Research On Approximate Reinforcement Learning In Continuous Space
2	Actor-Critic Algorithms With Continuous Action Spaces
3	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
4	Research On Actor-Critic Algorithm Based On The Bayesian Theory
5	Exdloratory Action Correction Algorithm Based On Actor-Critic
6	Research On Deep Reinforcement Learning Algorithm Based On Dual-Agent Cooperation
7	Researches On Improvement Of Fixed Temperature Soft Actor Critic Algorithm
8	Option Learning Method Research With Double Actor-Critic Architecture
9	Research On The Exploration Performance Of Policy Based On Actor-Critic Framework
10	Research On Three Key Problems In Reinforcement Learning