Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning

Posted on:2007-11-19

Degree:Master

Type:Thesis

Country:China

Candidate:L Zhou

Full Text:PDF

GTID:2178360182486491

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As a special type of random processes, Markov process has been used in lots of fields in our actual life. Markov decision processes (MDPs) and semi-Markov decision processes (SMDPs) are commonly mathematics models for these random systems. SMDPs are a class of model, which is more general than MDPs. The sojourn time of its states are followed as an arbitrary distribution. The performance optimization of MDPs and SMDPs with several criteria is one of the most active research fields.The concept of Markov performance potentials, which is introduced by Cao, offers a new framework and approach for the optimization of MDPs. Based on optimal Poisson equation and the optimal theorem with potentials, a lot of algorithms, such as policy iteration and value iteration, can be obtained. In recent years, reinforcement learning is more and more applied into these literatures. It is an important field of artificial intelligence, which combines concepts from dynamic programming, stochastic approximation via simulation, and function approximation. It has been shown to supply optimal or near-optimal solutions to large MDPs/SMDPs even when they are considered intractable via traditional dynamic programming methods.According to the relations between performance potentials and reinforcement learning, the paper is concerned with the performance optimization algorithms for SMDPs by simulation-based approximation of potentials. First, by the definition of an equivalent infinitesimal generator of SMDPs, SMDPs can be converted into an equivalent uniformized Markov chain. Some conclusions for MDPs may be extended into SMDPs. Second, by the definition of potentials based on Poisson equation and sample path, the uniform approximation formula of potentials can be obtained with the discounted and average criteria by reinforcement learning. Further, a new approach, neuro-dynamic programming (NDP), which can solve the optimization problem of large-scale discrete event dynamic systems is introduced. The uniform optimization algorithm of SMDPs with two criteria can be derived byNDP with critic model based on performance potentials. Then, the unified optimization approach of SMDPs under two criteria based on Q-learning, which is an efficient method for models with incomplete information, is discussed. By some results about Q-factor and potentials, the uniform Bellman optimizality equation and the learning formula for Q-factor are provided. In addition, the paper is also concerned with the optimization problems of the multichain SMDPs on the compact action set. Under some assumptions, the algorithms of policy iteration for multichain SMDPs based on potentials are presented.Through the numerical examples about SMDPs, some optimization results are also provided.

Keywords/Search Tags:

Semi-Markov decision processes, Performance potentials, Uniformized Markov chain, Reinforcement learning, Neuro-dynamic programming.

PDF Full Text Request

Related items

1	NDP Optimization For Large-scale Markov Systems Based On Performance Potentials-learning
2	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
3	Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials
4	Robust Control For Uncertain Semi-Markov Decision Processes Based On Performance Potentials
5	Performance Sensitivity Analysis And Optimization Of Extended Markov Decision Processes
6	Inverse Reinforcement Learning Algorithms In Semi-markov Environment
7	Reinforcement Learning Algorithms For Semi-markov Decision Processes
8	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
9	Reinforcement Learning And Its Applications In Navigation And Control Of Mobile Robots
10	Continuous-Time Unified MAXQ Algorithm And Its Application