The Reinforcement Learning Research Based On Internal State In Partially Observable Markov Decision Processes

Posted on:2009-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:C S Fang

Full Text:PDF

GTID:2178360245971547

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning (RL) is an important branch of machine learning. RL devises the map policy from states to actions by "trial-error" principle and learns to react under all states, so that the adaptability and robustness of AI systems could be improved.Inspire of some achievement in this area, there are still many problems unsolved, partially observable issue is one of them. The POMDP is an ideal model to tackle such kind of problems.However, the modeling POMDP comes at a price-exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We do some works about model optimizing and algorithm improving on in this dissertation. The main works are as follows.Firstly, in the POMDP model, the internal state of agent is introduced and the experience of agent is used. The POMDP reinforcement learning improved model based on internal state is proposed in this dissertation. The description of the example policy indicates that the policy complexity reduced and the learning efficiency improved.Secondly, the eligibility trace is introduced into the model. The PGI-POMDP algorithm, approximate reinforcement learning algorithm based on policy gradient methods, is proposed. The results have proved that PGI-POMDP algorithm can reduce the computable complexity and improve the computation efficiency.Thirdly, the method is applied on MAS, and MIS-GPOMDP algorithm is proposed. The MIS-GPOMDP algorithm is one of the policy gradient methods on MAS. The experimental results have shown that the learning efficiency and the cost of time and space are both improved.

Keywords/Search Tags:

POMDP, Reinforcement Learning, Internal State, MAS, Policy Gradient

PDF Full Text Request

Related items

1	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
2	Research On Policy Gradient Methods With Variance Related Risk Criteria
3	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
4	Research On Multiagent Cooperation And Applications Based On Reinforcement Learning
5	Research On Learning Of The Optimal Policy In Largescale State Space
6	Research On Regularized Policy Gradient
7	Deep Reinforcement Learning Based On Policy Gradient Optimization And Its Application In Agent Control
8	Optimization On Deep Reinforcement Learning Based On Policy Gradient
9	Research On Policy Gradient Methods Based On Functional Gradients
10	Research On Reinforcement Learning Methods Based On Direct Policy Search