Font Size: a A A

Researches On Hierarchical Reinforcement Learning Based On Abstract Actions

Posted on:2017-04-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z P XuFull Text:PDF
GTID:2308330488961930Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning has a good ability of automous learning in complex systems, which has been widely used in pratical applications. However, the development of reinforcement learning is also subject to the “Curse of dimensionality” problem. Hierarchical reinforcement learning decomposes the learning task into multiple subtasks and solves them respectively, which can effectively solve the “Curse of dimensionality” problem in reinforcement learning. The Option framework is one of the three hierarchical reinforcement learning frameworks, this paper puts forward several control optimization and automatic abstraction hierarchical reinforcement learning methods based on the Option framework. The main researches are concluded as follows:i. In order to address the problem that the traditional abstract action based methods can not solve learning and control problem in dynamic environment, we propose an online learning algorithm using interrupting abstract actions called IMQ and prove its convergence theoretically. IMQ algorithm can effectively solve the problem of large scale data that traditional reinforcement learning methods are unable to deal with. IMQ algorithm combines the ideas of interruption with the characteristics of dynamic environment, improving the efficiency of learning and controlling strategy in the dynamic environment.ii. With respect to the problem that identifing the sub-goals requires a lot of time because of too large trajectory sampling noise when using diverse density based abstraction discovery method, we propose a new algorithm for autonomous discovery abstract actions using acyclic state trajectories based on the metric of diverse density. This algorithm can effectively reduce the learning time and optimize the abstract actions by reducing the noise of the trajectory samples. The algorithm avoids the problem of large amount of computation due to excessive sampling, not only reducing the time needed to identify the sub-goals, but also being able to discovery better abstract actions, which improves the learning efficiency of the algorithm.iii. In view of the problem that the traditional automatic hierarchical DT-SMDP based methods cannot be directly used to continuous-time infinite task problem, we put forward a new CT-SMDP based automatic hierarchical reinforcement learning method to solve finite tasks of continuous-time. The algorithm has good effect on controlling and learning when solving the continuous-time task.
Keywords/Search Tags:hierarchical reinforcement learning, Option, abstraction, automatic hierarchy, automatic discovery
PDF Full Text Request
Related items