Font Size: a A A

Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning

Posted on:2007-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2178360182486491Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a special type of random processes, Markov process has been used in lots of fields in our actual life. Markov decision processes (MDPs) and semi-Markov decision processes (SMDPs) are commonly mathematics models for these random systems. SMDPs are a class of model, which is more general than MDPs. The sojourn time of its states are followed as an arbitrary distribution. The performance optimization of MDPs and SMDPs with several criteria is one of the most active research fields.The concept of Markov performance potentials, which is introduced by Cao, offers a new framework and approach for the optimization of MDPs. Based on optimal Poisson equation and the optimal theorem with potentials, a lot of algorithms, such as policy iteration and value iteration, can be obtained. In recent years, reinforcement learning is more and more applied into these literatures. It is an important field of artificial intelligence, which combines concepts from dynamic programming, stochastic approximation via simulation, and function approximation. It has been shown to supply optimal or near-optimal solutions to large MDPs/SMDPs even when they are considered intractable via traditional dynamic programming methods.According to the relations between performance potentials and reinforcement learning, the paper is concerned with the performance optimization algorithms for SMDPs by simulation-based approximation of potentials. First, by the definition of an equivalent infinitesimal generator of SMDPs, SMDPs can be converted into an equivalent uniformized Markov chain. Some conclusions for MDPs may be extended into SMDPs. Second, by the definition of potentials based on Poisson equation and sample path, the uniform approximation formula of potentials can be obtained with the discounted and average criteria by reinforcement learning. Further, a new approach, neuro-dynamic programming (NDP), which can solve the optimization problem of large-scale discrete event dynamic systems is introduced. The uniform optimization algorithm of SMDPs with two criteria can be derived byNDP with critic model based on performance potentials. Then, the unified optimization approach of SMDPs under two criteria based on Q-learning, which is an efficient method for models with incomplete information, is discussed. By some results about Q-factor and potentials, the uniform Bellman optimizality equation and the learning formula for Q-factor are provided. In addition, the paper is also concerned with the optimization problems of the multichain SMDPs on the compact action set. Under some assumptions, the algorithms of policy iteration for multichain SMDPs based on potentials are presented.Through the numerical examples about SMDPs, some optimization results are also provided.
Keywords/Search Tags:Semi-Markov decision processes, Performance potentials, Uniformized Markov chain, Reinforcement learning, Neuro-dynamic programming.
PDF Full Text Request
Related items