Font Size: a A A

Research On Reinforcement Learning Based On Value Function Approximation And State Space Decomposition

Posted on:2012-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:L ZuoFull Text:PDF
GTID:2218330362960212Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning (RL) is efficient at solving uncertain sequential decision problem and it has become one of the key research issues in machine learning in recent years. How to conquer the curse of dimensionality and achieve good generalization performance in continuous space is significant for the future development and application of reinforcement learning. Attempts to combat the curse of dimensionality have been made in this paper. On the other hand, as mobile robots will be widely applied, much better performance requirements have been placed on the intelligent navigation technologies for mobile robots. To improve the ability of autonomous navigation and adaptability is important to the successful application of mobile robots in unknown environments. Reinforcement learning algorithms based on value function approximation and state space decomposition are studied in the paper, and then, are used in the obstacle avoidance of the mobile robot in unknown environments. The main contributions and innovations of this paper can be summarized as follows:1. A k-mean clustering based representation policy iteration (RPI) algorithm is proposed. The RPI algorithm based on the graph Laplacian is studied. Then the clustering analysis is introduced to improve the method of selecting the points set, which is used to construct the graph. As a result, a k-means clustering based RPI algorithm for reinforcement learning is proposed and the simulation results show that the new method can enhance the generalization ability of the RPI algorithm efficiently.2. The real-time learning control of the inverted pendulum system is achieved. Based on the research of the value function approximation methods, the RPI and k-means clustering based RPI are used in real-time control of the model-free pendulum system with learning. The results show that the reinforcement learning based control methods can achieve good performance. The research is valuable to the engineering application of reinforcement learning.3. A hierarchical representation policy iteration (HRPI) algorithm based on state space decomposition is proposed. The hierarchical reinforcement learning algorithms are studied in the paper. Then, after combining the RPI and a state space decomposition method, the HRPI algorithm is proposed. In this method, state space is decomposed into different hierarchies according to the approximate value function and the policies are learned on each state hierarchy respectively. The simulation results indicate that the method has better performance in time optimization problems.4. A new obstacle avoidance method for mobile robots using RPI is proposed. The modeling method of the Markov Decision Process (MDP) for the autonomous obstacle avoidance of the mobile robot in unknown environments is introduced. Then the RPI algorithm and the rolling window planning method are combined together, and a new autonomous obstacle avoidance method based on RPI is proposed for mobile robots. The control performance of the method is tested by both the simulation and experiments. The results indicate that the mobile robot using the proposed method can void the obstacles in unknown environments successfully.
Keywords/Search Tags:Reinforcement Learning, Value Function Approximation, Representation Policy Iteration, State Space Decomposition, Autonomous Obstacle Avoidance
PDF Full Text Request
Related items