Font Size: a A A

Continuous-Time Unified MAXQ Algorithm And Its Application

Posted on:2012-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiuFull Text:PDF
GTID:2178330335961578Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The hierarchal reinforcement learning with abstraction mechanism can reduce the dimension of state space, so as to solve the problem of"curse of dimensionality"existing in the large-scale systems. Due to the abstraction mechanism, the hierarchal reinforcement learning can accelerate the policy learning speed and save the memory of state-action pairs. There are three typical hierarchical reinforcement learning algorithms: Option, HAM and MAXQ. However, traditional hierarchical reinforcement learning algorithms are mostly based on the framework of discrete-time SMDP model or discrete-time multi-agent SMDP model, which can not solve the continuous-time single agent, multi-agent learning system problems, and can only apply to the average criteria or discounted criteria.In this dissertation, under the framework of the concept of performance potential, combined with the existed MAXQ algorithm and continuous-time SMDP model, we introduce a continuous-time unified MAXQ algorithm under either average- or discounted-cost criteria. Because the web service composition problem can be modeled as a semi-Markov decision process model, so the proposed algorithm can be used to solve large-scale web service compositions. In addition, we use a travel reservation as an example to illustrate the high effectiveness of the proposed algorithm, and the simulation results show that, it has better optimization performance, faster learning speed and less memory than the flat Q-learning.But the capacity of a single agent is limited, more and more complex problems need to be solved by cooperating between each agent. So, combined with the concept of performance potential and the continuous-time unified MAXQ algorithm constructed before, we introduce a multi-agent continuous-time unified MAXQ algorithm under either average- or discounted-cost criteria. And then, we use the proposed algorithm to solve multi-agent continuous-time large-scale web service composition problems. Finally, this proposed algorithm is tested in a travel reservation system, and the experimental results show that it needs less memory, and has a better optimization performance and faster learning speed than selfish multi-agent algorithm and single-agent algorithm.
Keywords/Search Tags:semi-Markov decision processes (SMDP), multi-agent semi-Markov decision process (MSMDP), performance potential, MAXQ, web service composition
PDF Full Text Request
Related items