Font Size: a A A

Continuous Time Hierarchical Reinforcement Learning Algorithm

Posted on:2011-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2178360308973203Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Hierarchical reinforcement learning (HRL), such as Option, MAXQ and so on, by introducing abstraction mechanism , can be used to solve curse of dimensionality problems, and accelerates policy learning. Option algorithm is a extensive application hierarchical reinforcement learning method, which decomposes into sub-task by use macro and doesn't maks a decision until carrying a sub-task.The taditional Option algorithm is based upon discrete time semi-Markov decision process with discount criteria, and cannot apply to a wide class of continuing tasks of continuous time. So, the paper considers accumulative time reward in the actual model and superiorty of average criteria in a wide class of tasks, and it focus on a unified Option algorithm that applies to average or discount criteria. This algorithm can solve a wide class of continuing tasks of continuous time in single-agent or multi-agent system.This paper firstly takes single-agent as research background and introduces a kind of continuous time hierarchical reinforcement learning model, and propose a unified Option algorithm that applies to average or discounts criteria. The algorithm is under the framework of performance potential and continuous time semi-Markov decision process, and can solve a wide class of continuing tasks of continuous time. Finally, this proposed hierarchical reinforcement learning optimization algorithm is tested in a robotic garbage collection system, and the experimental results show that it needs less memory, and has better optimization performance and faster learning speed than a continuous time simulated annealing Q-learning algorithm.Then, these papers takes multi-agent as research background and introduces a kind of continuous time multi-Agent hierarchical reinforcement learning model, and propose a Option algorithm that applies to average or discount criteria. The algorithm is under the framework of continuous time multi-Agent semi-Markov decision process, and it introduce a method of macro action communication that between agents on the top, which can solve a wide class of continuing tasks of continuous time multi-Agent. Finally, this proposed hierarchical reinforcement learning optimization algorithm is tested in a multi-Agent robotic garbage collection system, and the experimental results show that it needs less memory, and has better optimization performance and faster learning speed than a multi-Agent continuous time Option algorithm, which use joint stat and joint macro action on the top.
Keywords/Search Tags:semi-Markov decision process, multi-agent system, performance potential, Q-learning, Hierarchical reinforcement learning, Option
PDF Full Text Request
Related items