Continuous Time Hierarchical Reinforcement Learning Algorithm

Posted on:2011-11-18

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhang

Full Text:PDF

GTID:2178360308973203

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Hierarchical reinforcement learning (HRL), such as Option, MAXQ and so on, by introducing abstraction mechanism , can be used to solve curse of dimensionality problems, and accelerates policy learning. Option algorithm is a extensive application hierarchical reinforcement learning method, which decomposes into sub-task by use macro and doesn't maks a decision until carrying a sub-task.The taditional Option algorithm is based upon discrete time semi-Markov decision process with discount criteria, and cannot apply to a wide class of continuing tasks of continuous time. So, the paper considers accumulative time reward in the actual model and superiorty of average criteria in a wide class of tasks, and it focus on a unified Option algorithm that applies to average or discount criteria. This algorithm can solve a wide class of continuing tasks of continuous time in single-agent or multi-agent system.This paper firstly takes single-agent as research background and introduces a kind of continuous time hierarchical reinforcement learning model, and propose a unified Option algorithm that applies to average or discounts criteria. The algorithm is under the framework of performance potential and continuous time semi-Markov decision process, and can solve a wide class of continuing tasks of continuous time. Finally, this proposed hierarchical reinforcement learning optimization algorithm is tested in a robotic garbage collection system, and the experimental results show that it needs less memory, and has better optimization performance and faster learning speed than a continuous time simulated annealing Q-learning algorithm.Then, these papers takes multi-agent as research background and introduces a kind of continuous time multi-Agent hierarchical reinforcement learning model, and propose a Option algorithm that applies to average or discount criteria. The algorithm is under the framework of continuous time multi-Agent semi-Markov decision process, and it introduce a method of macro action communication that between agents on the top, which can solve a wide class of continuing tasks of continuous time multi-Agent. Finally, this proposed hierarchical reinforcement learning optimization algorithm is tested in a multi-Agent robotic garbage collection system, and the experimental results show that it needs less memory, and has better optimization performance and faster learning speed than a multi-Agent continuous time Option algorithm, which use joint stat and joint macro action on the top.

Keywords/Search Tags:

semi-Markov decision process, multi-agent system, performance potential, Q-learning, Hierarchical reinforcement learning, Option

PDF Full Text Request

Related items

1	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
2	Inverse Reinforcement Learning Algorithms In Semi-markov Environment
3	Continuous-Time Unified MAXQ Algorithm And Its Application
4	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
5	Research Of Multi-agent Cooperation Mechanism Based On Reinforcement Learning
6	Research On Reinforcement Learning Based Communication Jamming Strategy Learning Methods
7	Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials
8	Study On The Learning And Planning Algorithm Of Intelligent Agent Based On Performance Potentials
9	Look-ahead Control For CSPS Model Based On Learning
10	Multi Agent Path Planning And Formation Based On Hierarchical Reinforcement Learning