Font Size: a A A

Autonomous Mission Decomposition Based On Hierarchical Reinforcement Learning

Posted on:2007-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TangFull Text:PDF
GTID:2178360185975659Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning is an effective method to solve the plan problem in the stochastic environment .However, in the large state space, especial for application problems with complex stochastic states, the "dimension curse" problem hasn't been solved yet. At present, the Hierarchical Reinforcement Learning which develop from Reinforcement Learning in state space and action space, has been proven to be more effective to solve the large scale state stochastic control problem, and be applicable in the AGV navigation . Now for almost all researches the hierarchical structures are designed in advance and there has been relatively little research on autonomously discovering or creating useful hierarchies. Based on this idea ,the following aspects are investigated and discussed.The basic theory background and development of hierarchical reinforcement learning are introduced .Three typical RL algorithms are discussed and compared. The empirical results are presented to show their differences and characteristics, which offer a basis to choose the right algorithm in the following work.The methods to find the useful subgoal autonomously in the two different environments are studied. When the environment is simple, as the learning speed which McGovern's method to learn the model is too slow, the Actor-Critic method based on the Borelzman distributions is proposed to learn the environment; as for to create the subgoal autonomously , the thesis firstly analyze the models's properties ,and propose an new concept : frequency change ratio. then choose the state with the max frequency change ratio on the learned policy model' s properties as the subgoal.When the environment is relatively complex, the heuristic method is used to create the subgoal action sequence, then delete the state which isn't on the success path to form the useful subgoal.At last the thesis improve the method which McGovern used to form the hierarchical policy. Firstly , a class of SARSA algorithm based on the heuristic search is proposed to determine the agent's action option .Then the new subgoals are added as new abstract action into the old action set to form the hierarchical policy.
Keywords/Search Tags:autonomous mission decomposition, hierarchical reinforcement learning, heuristic search, hierarchical policy, abstract
PDF Full Text Request
Related items