Continuous-Time Unified MAXQ Algorithm And Its Application

Posted on:2012-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:W J Liu

Full Text:PDF

GTID:2178330335961578

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The hierarchal reinforcement learning with abstraction mechanism can reduce the dimension of state space, so as to solve the problem of"curse of dimensionality"existing in the large-scale systems. Due to the abstraction mechanism, the hierarchal reinforcement learning can accelerate the policy learning speed and save the memory of state-action pairs. There are three typical hierarchical reinforcement learning algorithms: Option, HAM and MAXQ. However, traditional hierarchical reinforcement learning algorithms are mostly based on the framework of discrete-time SMDP model or discrete-time multi-agent SMDP model, which can not solve the continuous-time single agent, multi-agent learning system problems, and can only apply to the average criteria or discounted criteria.In this dissertation, under the framework of the concept of performance potential, combined with the existed MAXQ algorithm and continuous-time SMDP model, we introduce a continuous-time unified MAXQ algorithm under either average- or discounted-cost criteria. Because the web service composition problem can be modeled as a semi-Markov decision process model, so the proposed algorithm can be used to solve large-scale web service compositions. In addition, we use a travel reservation as an example to illustrate the high effectiveness of the proposed algorithm, and the simulation results show that, it has better optimization performance, faster learning speed and less memory than the flat Q-learning.But the capacity of a single agent is limited, more and more complex problems need to be solved by cooperating between each agent. So, combined with the concept of performance potential and the continuous-time unified MAXQ algorithm constructed before, we introduce a multi-agent continuous-time unified MAXQ algorithm under either average- or discounted-cost criteria. And then, we use the proposed algorithm to solve multi-agent continuous-time large-scale web service composition problems. Finally, this proposed algorithm is tested in a travel reservation system, and the experimental results show that it needs less memory, and has a better optimization performance and faster learning speed than selfish multi-agent algorithm and single-agent algorithm.

Keywords/Search Tags:

semi-Markov decision processes (SMDP), multi-agent semi-Markov decision process (MSMDP), performance potential, MAXQ, web service composition

PDF Full Text Request

Related items

1	Robust Control For Uncertain Semi-Markov Decision Processes Based On Performance Potentials
2	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
3	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
4	Optimal Robust Decision Problems For Uncertain Semi-Markov Systems
5	Reinforcement Learning Algorithms For Semi-markov Decision Processes
6	Performance Sensitivity Analysis And Optimization Of Extended Markov Decision Processes
7	Asynchronous Optimization Algorithms For SMDP Based On Performance Potential
8	Research On Agent Decision Problem Based On Markov Decision Process Theory
9	Research On Soccer Robots Cooperation Mechanism Based On Markov Decision Theory
10	Semi-Markov-Based Security Effectiveness Evaluation And Defense Decision-Making For Dynamic Platform Techniques