Performance Sensitivity Analysis And Optimization Of Extended Markov Decision Processes

Posted on:2007-10-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y J Li

Full Text:PDF

GTID:1118360185951471

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology, there are large numbers of complicated and stochastic systems in many areas, including communication (Internet and wireless), manufacturing, intelligent robotics, and traffic management etc.. So far, performance optimization of these systems is the research focus of different disciplines, which including perturbation analysis (PA) of discrete event dynamic systems in control systems, Markov decision processes (MDPs) in operations research, reinforcement learning (RL) or Neuro-dynamic programming (NDP) in computer sciences and artificial intelligence. Although different areas take different perspectives and have different formulations for structures of these systems, they share the common goal: to make the best decision to optimize the system performance.Recently, a performance optimization method from a sensitivity point view can explain and unify the above methods in different areas. The potential theory is the basis of this method. By using two types of performance sensitivity formulas, one for performance derivatives and the other for performance differences, existing results in the above different areas and their relations can be derived or explained in a simple and intuitive way. This method can not only obtain the optimal policy by solving the theoretical values, but also improve the system performance on line based on one sample path even if the parameters of systems are unknown. Thus, this method can solve the problems about the curse of dimensionality and the curse of modeling in some sense. However, this method mainly studies the Markov systems and rarely studies the non-Markov systems. In this paper, on the basis of this method, we mainly focus on the studies of performance sensitivity analysis and performance optimization for Semi-Markov decision processes (SMDPs) and partially observable Markov decision processes (POMDPs). SMDPs and POMDPs are two different extensions of MDPs. The sojourn time of SMDPs is not exponentially distributed, but generally distributed. The states of POMDP cannot be observed directly, but an observation associated with the state can be drawn according to a probability distribution. These characters make the description of practical systems more reasonable. Thus, the results and the algorithms in this paper can apply to the optimizations of more practical systems.In this paper, by using equivalent Markov decision processes, two infinitesimal matrixes are introduced for SMDPs under the average performance criterion and the discounted performance criteria. Then, the potentials of SMDPs are defined by using these matrixes and the performance difference formula and the performance derivative formula...

Keywords/Search Tags:

Semi-Markov decision processes(SMDPs), Partially observable Markov decision processes(POMDPs), Performance optimization, Performance sensitivity analysis, Event-based optimization, switching processes

PDF Full Text Request

Related items

1	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
2	Exploiting structure to efficiently solve large scale partially observable Markov decision processes
3	Algorithms for partially observable Markov decision processes
4	Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments
5	Semi-Markov Switching State-Space Control Processes And Its Applications
6	Resource Management Research Based On Markov Decision Processes In Wireless Networks
7	Learning partially observable Markov decision processes using abstract actions
8	Pond-hindsight: Applying hindsight optimization to partially-observable markov decision processes
9	Hierarchical learning and planning in partially observable Markov decision processes
10	Finite memory policies for partially observable Markov decision processes