Font Size: a A A

Research On Hierarchy Reinforcement Learning Algorithm And Its Application

Posted on:2010-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhengFull Text:PDF
GTID:1118360305987154Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, reinforcement learning has been one of the key research fields in artificial intelligence and machine learning. Reinforcement learning attempts to find an optimal control policy by trial-and-error interactions with a dynamic environment. Reinforcement learning has been widely used in the fields of artificial intelligence, machine learning and automatic control etc. due to its on-line adaptability and self-learning ability in the continuous or high dimensional system. But in the large scale state space or continuous state space, reinforcement learning suffers from low learning efficiency and slow convergence. Hierarchy reinforcement learning proposed in the 1990s is an effective method to speed convergence of reinforcement learning. And the sub-task policy of hierarchy reinforcement learning can be reused, which make it popular currently.The main research of this paper is hierarchy reinforcement learning and its knowledge transfer method. The research contents contain these two parts:the first part is that how to improve hierarchy reinforcement learning to speed its convergence in a single task; the second part is that in the tasks whose state transition probabilities are changed with parameters how does hierarchy reinforcement learning obtain knowledge which is unrelated to parameters and how to speed the algorithm in several tasks by knowledge transfer. The main creative work and research of the paper is summarized as follows:1. A reinforcement learning algorithm based on stable state space is proposed, which speed reinforcement learning by reducing state space need to explore and learn. A state pre-estimate rule and a modified reward signal are proposed to ensure that the local stable state space can converge independently and obtain the optimal polices in the local stable state space. An exploration policy based on pre-estimate rule and action continuity rule is also proposed to limit the exploration within the stable state space. So the algorithm has made great progresses to combat the curse of dimensionality in reinforcement learning, as the learning time only increase exponentially with the local stable state space. This paper points out that the limit cycle problem exists in reinforcement learning when it is applied in the inverted pendulum system. The limit cycles make reinforcement learning algorithms oscillate and destroy the stabilization of the optimal control policy. A limit cycles detection method based on balance state is proposed to solve the limit cycles problem in the reinforcement learning. The algorithm can obtain stable control policy, which provides a foundation for further obtaining knowledge in hierarchy reinforcement learning.2. A hierarchy Option algorithm based on qualitative model and a hierarchy step-by-step exploration policy are proposed to overcome the problem that common exploration policy has to tradeoff between exploration and exploitation in the reinforcement learning. According to the characteristic of system, a novel qualitative model definition and a sub-optimal qualitative action pre-estimate rule are proposed. According to this rule, the hierarchy step-by-step exploration policy firstly selects a sub-optimal qualitative action to perform "exploit"; then selects an action from the qualitative action to perform "explore". Compared with common reinforcement learning algorithms which allocate time of exploration policy for exploration and exploitation by time-sharing method, the exploration policy allocates exploration and exploitation time in hierarchy step and is free from the problems exist in common time-sharing exploration polices. And it is helpful for the algorithm to extract the common features of the tasks with different parameters, as the algorithm accomplishes knowledge transfer and system control in the sub-tasks of different levels. The hierarchy structure in the algorithm provides a foundation for knowledge transfer in hierarchy reinforcement learning.3. An Option algorithm based on qualitative fuzzy model is proposed. This method can overcome the problem that the common knowledge transfer method is hard to apply in the tasks that the state transfer probability changes with the parameters. The algorithm defines the qualitative model to describe the common features of the tasks with different parameters and transform parameters related tasks to parameters unrelated tasks. A qualitative fuzzy network is proposed to learn the sub-optimal policy of qualitative model and extract the common features of sub-optimal policy to obtain knowledge unrelated to parameters. A reward signal based on state path is also proposed to modify qualitative fuzzy network dynamically, which can make it applicable in new parameters systems. This knowledge transfer method based on qualitative model can describe the common control rules of the tasks with different parameters to solve parameter sensitivity exists in common knowledge transfer methods and extend common knowledge transfer methods from parameters unrelated tasks to parameters related tasks.The directions for future research work are discussed in the last chapter of this paper.
Keywords/Search Tags:Reinforcement Learning, Option, Knowledge Transfer, Qualitative Model, Exploration Policy, Inverted Pendulum
PDF Full Text Request
Related items