Learning partially observable Markov decision processes using abstract actions

Posted on:2013-01-12

Degree:M.S

Type:Thesis

University:The University of Texas at Arlington

Candidate:Janzadeh, Hamed

Full Text:PDF

GTID:2458390008981208

Subject:Computer Science

Abstract/Summary:

Transfer learning and Abstraction are among the new and most interesting research topics in AI and address the use of learned knowledge to improve learning performance in subsequent tasks. While there has been significant recent work on this topic in fully observable domain, it has been less studied for Partially Observable MDPs. This thesis addresses the problem of transferring skills from the previous experiences in POMDP models using high-level actions (Options) in two different kind of algorithms: value iteration and expectation maximization. To do this, this thesis first proves that the optimal value function remains piecewise-linear and convex when policies are made of high-level actions, and explains how value iteration algorithms should be modified to support options. The resulting modifications could be applied to all existing variations of the value iteration and its benefit is demonstrated in an implementation with a basic value iteration algorithm. While the value iteration algorithm is useful for the smaller problems, it is strongly dependent on knowledge of the model. To address this, a second algorithm is developed. In particular, expectation maximization algorithm is modified to learn faster from a set of sampled experiments instead of using exact inference calculations. The goal here is not only to accelerate learning but also to reduce the learner's dependence on complete knowledge of the system model. Using this framework, it is also explained how to plug options in the model when learning the POMDP using a hierarchical EM algorithm. Experiments show how adding options could speed up the learning process.

Keywords/Search Tags:

Using, Value iteration, Algorithm, Observable, Options

Related items

1	Deep Value Iteration Network For Partially Observable Markov Decision Process
2	The Research And Design Of Point-based POMDP Value Iteration Algorithm
3	The Design And Implementation Of Point-based POMDP Policy Iteration Algorithm
4	Algorithms for partially observable Markov decision processes
5	Quantitative Trading Strategy For On-site Options Based On LSTM-LightGBM
6	The Algorithm Of Low-Observable Multi-target Detection And Tracking
7	Design And Implementation Of Options Brokerage Orders Trading System
8	The Design Of A Remote OS Identification Tool Based On TCP Options
9	Agent Sequential Decision-making Approach And Its Application Under Uncertain Enviroment
10	Research On Negative Surveys Model With Personalized Options And Its Reconstruction Algorithm