Font Size: a A A

Accelerate The Reinforcement Learning Method Of Study

Posted on:2011-07-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z JinFull Text:PDF
GTID:1118360308481260Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The autonomy and biological relevance of Reinforcement Learning(RL) have attracted considerable interests of researchers worked in Machine Learning literature and Artificial Intelligence literature, and Reinforcement Learning has shown its applicability and effectiveness in many problem domains. But the slow learning process and low learning performance of RL becomes a formidable obstacle to prevent RL from problems with large state space.At present, there are two classes approaches work well on speeding up reinforcement Learning. One is Hierarchical Reinforcement Learning (HRL), which speeds up learning from task decomposition; another one is Shaping Reinforcement Learning (SRL), which speeds up learning by limiting state space which agent would search. But both two approaches have a common drawback, that is the task decomposition and the shaping signal are dependent on outside observer, which makes the capability of speeding up learning of two classes approaches are constrained by the capability of outside observer to deal with the problem. If the outside observer can not decompose learning task or provide shaping signal, the two classes approaches lose their function. In this work, we implemented an approach: Shaping Bayesian Network based Reinforcement Learning (SBN-RL), to speed up Reinforcement Learning, which the knowledge an experience of agent acquired from the preceding learning can be used to decompose learning task and shape agent for the subsequent learning. This way not only implements speeding up learning from task decomposition, but also implements speeding up learning from limiting state space which agent would search,and the task decomposition and shaping learning are accomplished only according to the knowledge and experience of agent acquired from the preceding learning, and completely remove the dependence on the outside observer, which solves the problem that the capability of speeding up learning is constraint to the outside observer, and also makes agent not only can learn on itself but also speed up learning on itself.In this work, we first compute the State-Clusters from the State-Action Transitions acquired from agent's training episodes during tlearning process, then these accumulated State-Clusters are used to build up the Shaping Bayesian Network(SBN), which is the reorganization model of agent to the real original state space. The SBN is used to express and record the knowledge and experience of agent acquired from learning. By the Critical State in the SBN, which is also the only way must be passed if agent wants to reach goal state form initial state, to be the phased goal, the whole original learning task would be decomposed to become some smaller learning sub-tasks. This way makes use of the strategy of"separation concern"to speed up learning just like traditional HRL, but here the task decomposition was done by agent itself according to the SBN which is also built up by itself, no any dependence on the outside observer. Concurrently, all these Critical States arranged in different SBN's structure layers by the distance of them from goal state could be used to provide the more detailed and more complete shaping signal which covers the whole state space. This way speeds up learning by reducing state space which agent would search, is also similar with traditional SRL, but the shaping signal comes from the SBN which is also built up by agent itself, and the shaping signal is no longer dependent on outside observer.This is our major contribution to build up SBN form the accumulated State-Clusters, which makes agent can autonomously decompose learning task and shape learning, and makes agent not only can learn on itself but also speed up learning on itself. To process the capability to decompose learning task and shape learning by agent itself, are just the most basic precondition to scale RL to the complex problems with large state space, which are very difficult, even impossible, to be solved by the outside observer. For implementing this approach, we also researched how use State-Clusters to speed up the value function's convergence, and how use multiple agents to share their State-Clusters to speed up the value function's convergence more fast. We also researched how use a whole layer Critical States of SBN to be the phased goal of agent, when lack the very obvious single Critical State, which is the only way must be passed for reaching goal state. We also researched how use gate states to combine critical states to isolate the original state space. We also proved the optimal policy combined from phased optimal policy is equivalent to the optimal policy found in the original state space. We also discussed how use SBN to improve some present research works about speeding up RL.We verified the SBN-RL approach in a multi-intersection traffic light optimal control problem. For this verification, we developed specially a Multi-Intersection Urban Traffic Simulator(MIUTS) to support the SBN-RL approach to deal with the multi-intersection traffic light optimal control problem, and the goal of optimal control is to make all cars entered the city can pass through and leave the city in shortest time. This is a typical multi-agent learning problem. From the test results, the SBN-RL approach can effectively build up SBN, the phased tasks can be divided clearly, and agent can be shaped to search smaller state space. When the learning task is decomposed into two sub-tasks by SBN-RL approach, the average learning time to find the same optimal policy by SBN-RL approach can be reduced by 60% when compared with the traditional reinforcement learning. The time of all these cars to leave the city can be reduced by 20-30% when adopt the optimal policy computed by the SBN-RL approach, when compared with the traditional fix time interval traffic light control policy. It is very effective that the SBN-RL approach to deal with such a kind of complex multi-agent learning problems with large state space.Form the ability of agent can use its own knowledge and experience to build up SBN for speeding up subsequent learning, our works make indeed agent can accelerate autonomously learning.
Keywords/Search Tags:Reinforcement Learning, State-Clusters, Critical State, Shaping Bayesian Network, Multi-intersection traffic light optimal control
PDF Full Text Request
Related items