Font Size: a A A

Dynamically Tiered Model-based Reinforcement Learning Algorithm Research

Posted on:2012-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J H YuanFull Text:PDF
GTID:2208330335990047Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning (RL) has been an important branch of machine learning for its good characteristics of self-learning and online learning. However, RL is perplexed by the problem of'curse of dimensionality'(the number of parameters to be learned increases exponentially with the dimensionality of variables) in the large-scale environment, result in low learning efficiency which means that the agent can hardly complete the task in time, even fail to achieve the goal. Therefore, if a novel reinforcement learning method which can handle the problem of'curse of dimensionality'in the unknown large-scale world is presented, an effective solution proposal which is able to improve the adaptability of the agent in the application will be provided. More importantly, this study will have great significance to the development of machine learning theory and technology.This thesis studies the combination of dynamic hierarchical RL and model-based RL, so as to address the problem of'curse of dimensionality'in the unknown large-scale world as well as improve learning efficiency of the agent. During the process of model-based RL, a novel dynamic hierarchical reinforcement learning with adaptive clustering based on the exploration information (DHRL-ACEI) algorithm is proposed in this thesis. The DHRL-ACEI algorithm builds the MAXQ hierarchy in which state abstraction and temporal abstraction (also called action abstraction) are integrated, so it can accelerate the learning remarkably by restricting the exploration range of policy for every subtask in MAXQ hierarchy.First of all, the whole state subspace is divided into some state subspaces (regions) using adaptive clustering based on the exploration information algorithm in the model-based RL, which belongs to state abstraction, and an improved strategy of action selection is presented based on the set of terminal states w.r.t. these regions. Next, similar MAXQ hierarchy is constructed automatically based on frequency of successful execution w.r.t. actions, then incorporates those regions result from state abstraction into relevant subtasks of MAXQ by their sets of valid actions, so we build the MAXQ hierarchy automatically where state abstraction and temporal abstraction are combined. Then the recursively hierarchical optimal policy is derived based on the MAXQ framework, and the hierarchy will be updated dynamically in the following learning so as to reduce the limitation of unreasonable hierarchy built at first.Simulation results show that DHRL-ACEI algorithm presented in this thesis can handle the problem of'curse of dimensionality'and enhance learning efficiency of the agent significantly in the unknown large-scale environment, so that demonstrate the effectiveness of the DHRL-ACEI algorithm. Finally, this thesis makes some conclusions, and presents some issues for future research.
Keywords/Search Tags:agent, model-based reinforcement learning, adaptive clustering, dynamic hierarchical reinforcement learning, MAXQ
PDF Full Text Request
Related items