Font Size: a A A

Reinforcement Learning-based Optimal Control Methods With Applications To Mobile Robots

Posted on:2015-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YangFull Text:PDF
GTID:2348330509460660Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Different from supervised and unsupervised learning, reinforcement learning is a kind of machine learning methods which can get the reinforcement signal by interacting with the outer environment and utilize value function or the estimation of policy to realize the optimization of sequential decision processes. Aiming at conquering the “curse of dimensionality” in large-scale state and action space problems, reinforcement learning methods with value function approximation have been commonly used to solve large-scale optimal control problems. Meanwhile, due to the rare dependency of the accurate dynamic model and the ability to optimize controllers, reinforcement learning has huge potential for mobile robots' path tracking control. Under the support of the Nature Science Foundation of China, the paper focuses on the research of reinforcement learning based on value function approximation and the manifolds. Moreover, by integrating with the classical control method, reinforcement learning is used to realize mobile robots' high-precision path tracking. What the paper contributes to are listed below:1. With the research of linear temporal difference learning with gradient correction(TDC), we try to integrate it with control algorithms. Two improved optimal control methods, which are improved Q-Learning and improved HDP algorithms, are proposed to extend the TDC algorithm from learning prediction problems to learning control problems. Because TDC is a proper stochastic gradient descent method, the convergence of the improved Q-learning method can be guaranteed when doing off-policy training. By testing on the mountain-car and the inverted pendulum systems, the experiment results validate the efficiency of the proposed methods. Moreover, the performances under different learning rate parameters are also tested and analyzed.2. In order to overcome the difficulties in choosing basis functions of approximators, a novel automatic basis function generating method is proposed and used in the critic network construction in DHP algorithm. After that, a framework of Dual Heuristic Programming based on Geodesic Laplacian Eigenmaps(GLEM-DHP) is given in the paper. By comparing with other DHP methods on nonlinear dynamic systems, the outstanding performance of the proposed method can be seen both from the simulation results and experimental results.3. A better way in selecting PID parameters when using PID control algorithm has been discussed. By utilizing the learning ability in DHP, a new PID control algorithm with self-learning parameters is proposed to solve mobile robot's path tracking problems. The PID parameters can be generated and adjusted according to different reference paths and system states to decrease the total tracking error. The DHP-PID method is tested on three different kinds of paths and the tracking performances are all better than the PID algorithm. Moreover, we use the Mobile Sim platform to test the controller learned by DHP-PID algorithm on Pioneer3-AT wheeled mobile robot system and satisfactory simulation results are obtained.4. The successful applications on the Googol linear one-stage inverted pendulum system have emphasized the feasibility and efficiency of GLEM-DHP methods. Moreover, they lay the foundation of the practical engineering applications for Reinforcement Learning in the real world.
Keywords/Search Tags:Reinforcement Learning, Value Function Approximation, Temporal Difference Learning, Manifolds, PID Control, Mobile Robots' Path Tracking Control
PDF Full Text Request
Related items