Font Size: a A A

Multi-Agent Dynamic Hierarchical Reinforcement Learning Based On Hybrid Abstraction

Posted on:2013-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:C H DaiFull Text:PDF
GTID:1228330374488148Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement Learning (RL) can be applied to Multi-Agent System (MAS), because of its good characteristics of self-learning and online learning. However, RL suffers from the "curse of dimensionality", which results in huge memory requirement and is very time-consuming. What is worse, the size of the action space and the state space increase exponentially with the number of agents when RL is applied to MASs, leading to insufferable slowness in learning speed. Therefore, if a new multi-agent reinforcement learning method which is valid in the large-scale and unknown environment can be presented, then effective solution proposal for improving adaptability of MASs in the application can be provided. Furthermore, this research will have great significance to development of MASs theory and technology.To enhance the learning efficiency of MASs, the dissertation incorporates model-based RL to the hierarchical RL and elaborates the static hierarchical RL algorithm named as Bayesian-MAXQ, which integrates Bayesian-learning with MAXQ learning. Based on this, through state abstraction, a dynamic hierarchical RL based on probability model is proposed, that is DHRL-Model. At last, based on the analysis of the character of multi-agent RL and the extended DHRL-Model, a new algorithm multi-agent dynamic hierarchical reinforcement learning with adaptive clustering based on the exploration information (MADHRL-ACEI) is proposed. In particular, the research achievements in this dissertation include:(1) The feasibility analysis about dynamic hierarchical reinforcement learning in the large-scale and unknown environment.Starting from the natural of RL, the root of curse of dimensionality is analyzed. Then, comparing with different hierarchical RL algorithms, the benefits and drawbacks of state abstraction and action abstraction when they are used to deal with curse of dimensionality is explained, as well as the drawbacks of the state hierarchical RL in the dynamic and unknown environment. According to character of MAXQ. the feasibility of incorporating the model based RL to MAXQ by state abstraction and action abstraction and the key problems of realizing dynamic hierarchy in the framework of MAXQ are both discussed.(2) A static hierarchical algorithm Bayesian-MAXQ based on the model RL is proposed.To take the advantage of efficient model based RL (Bayesian-Q learning, et. al) and the ability of on-line learning of MAXQ, this dissertation probes the way to combining Bayesian-Q and MAXQ and solves the data recording problem of Bayesian-Q in the hierarchy and the iteration of value function. The main improvements are:adding the relationship between the state-action pairs in the same level and establishing the forward and backward topological relation of state transition, modifying the equation of priority calculating in Prioritized Sweeping and realizing the solution of value function by dynamic programming. Based on this work, a static hierarchical algorithm based on model RL, known as Bayesian-MAXQ, is proposed and the validity of it is verified on the taxi problem.(3) A dynamic hierarchical RL based on probability model DHRL-Model is proposed.Though Bayesian-MAXQ improves the efficiency in large-scale environment, it is only appropriate for the hierarchical structure which is known for learner. For the large-scale and unknown environment, using state abstraction, a new MAXQ with dynamical hierarchy is proposed. By the state clustering of sub-goals, the sub-goal states are identified automatically. Based on the sets of sub-goal state, the MAXQ-like hierarchy is generated and updated dynamically during learning. Based on the dynamical structure, the Bayesian-MAXQ algorithm is adopted to calculate the recursive optimal policy in the solution space. Thus, the DHRL-Model algorithm is formed, which improves the learning efficiency remarkably in the unknown environment.(4) A multi-agent dynamic hierarchical reinforcement learning algorithm with adaptive clustering based on the exploration information is proposed.To cope with the severe curse of dimensionality in MASs and improve the learning efficiency of MASs so as to apply in the unknown and complex environment, the dissertation proposes a multi-agent dynamic hierarchical reinforcement learning algorithm with adaptive clustering based on the exploration information. This algorithm is based on the action abstraction algorithm based on the return cycle and the state clustering algorithm in DHRL-Model, which generates the MAXQ-like hierarchy automatically and optimizes the structure dynamically respectively. Moreover, the proposed algorithm obtains the cooperative and recursive optimal policy in dynamical hierarchy, reduces the searching space drastically, and accelerates the learning speed. The simulation results show that the proposed algorithm enhances the learning efficiency obviously and relieves the bottleneck problem of application of MASs.
Keywords/Search Tags:Multi-agent system, Dynamic hierarchical reinforcement learning, Model-based reinforcement learning, State clustering adaptively, Action abstractionbased on the return cycle
PDF Full Text Request
Related items