Font Size: a A A

Research On The Reinforcement Learning And Its Application

Posted on:2011-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:M L XuFull Text:PDF
GTID:1118360302487725Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning (RL) is an important machine learning framework that can get optimum policy based on the interaction with the environment. The policy is updated according to the punishment or Awards, namely reinforcement signal that given by environment. Reinforcement learning not only has the quality of low requirement for the prior knowledge about the environment but also can learn online for the real-time environment. RL has attracted many researches and widely is used in the field of intelligent control and sequential decision.The main aim of reinforcement learning is to learn the mapping from the state space to the action space and that can be determined completely by the value function estimation such as state value function and the pair of state and action value function that can be approximated using the parameter function in essence. The classical reinforcement learning only concerns small scale discrete state and action space and the value function described with the Look-Up Table (LUT). In order to improve the performance of the Reinforcement learning in the large-scale discrete space and continuous state space or continuous action space, hierarchical learning and generalization methods are introduced into reinforcement learning.In terms of hierarchical learning, the hierarchical reinforcement learning (HRL), such as Options, HAM and MAXQ have been presented. The key in the hierarchical reinforcement learning is to automaticly decompose the task into several approviate sub-tasks, In the OPTION framework, it is widely used because it is easy to automatic generate subtasks, esp. by partitioning regions or stages, such as bottleneck states. While in order to generalize the RL to continuous state space or action space generalization methods such as neural network and fuzzy inference system is introduced. The Q-learning has the merits such as easiness to understand and realize and is widely used. In the related literatures neural network or fuzzy inference system is used to approach indirectly the action-value function. The inputs of neural network or fuzzy inference system are the states and the outputs are the Q-values of severalcorresponding discrete actions. The action that acts on environments is based on those several discrete'seed' actions. The choice of the 'seed'action plays an important role in those methods. Bad choice may decrease the performance of those reinforcement learning and unfortunately there is no available knowledge to chose discrete'seed'actions This dissertation first summarizes the background and the theorem of the reinforcement learning, then focus on the OPTION automatic construct based on bottleneck states in the state space and the action value function in the continuous state and action space directly approaching with nueral network and fuzzy inference system.Wheeled mobile robots can move and work autonomously in certain environments, and have been widely used in many areas such as industry, agriculture, daily life and military affairs. Navigation is the most fundamental and important function for wheeled mobil robot. Reinforcement learning is widely used in the navigation of wheeled mobile robots for its merits of on-line adaptability, self-learning ability for complex system and human-like thinking mode. The paper focuses on the navigation method by followint the wall based on reactive controlThe main content and contributions in this dissertation include:1. Automatic construct of Options based on taboo states is presented. In this method the taboo state is introduced in the environment for agent to automatically construct Options. During the interaction with the environment the learning agent can discovery automatically the bottlenecks and choose the appropriate bottleneck as the sub-goal of Option. Morever the initial set of Option can be obtained and the policies of Options can be learnt simultaneously. Several grid-world tasks illustrate that the agent can automatically construct useful Options online2. The RBFQ is presented. Although the scale of the radial based function neural network is lagre, it has the capability of local and universal approach and quickly learns and rapidly converges. In order to avoding the chose of seed actions, the radial based function neural network is used to approach directly the action value function. The structure and parameters identification of RBF is accomplished automatically and simultaneously in an adaptive way with a self-organizing approach according to the TD error and distance between the pair of state and action and the center of radial based function. The optimization method is used to search greedy action. Experimental results of the balancing control of a cart-pole system demonstrate the superiority and applicability of the proposed method.3. The Q-learning base on fuzzy inference system is presented. In this method the fuzzy inference system (FIS) is used to approach the action value function. The number of the rule increases in adaptivety way. The parameters of the consequent part and premise part of fuzzy inference system can be updated. The optimization method is used to search greedy action. Experimental results of the balancing control of a cart-pole system demonstrate the applicability of the proposed method.4. The navigation of mobile robots based on the reinforcement learning is studied. Simulate results demonstrate the proposed AFQL method with good generalization and efficiency can accomplish task for the mobile robots wall-following.
Keywords/Search Tags:reinforcement, Q-learning, OPTION, The navigation of mobile robots
PDF Full Text Request
Related items