Font Size: a A A

Performance Sensitivity Analysis And Optimization Of Extended Markov Decision Processes

Posted on:2007-10-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:1118360185951471Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology, there are large numbers of complicated and stochastic systems in many areas, including communication (Internet and wireless), manufacturing, intelligent robotics, and traffic management etc.. So far, performance optimization of these systems is the research focus of different disciplines, which including perturbation analysis (PA) of discrete event dynamic systems in control systems, Markov decision processes (MDPs) in operations research, reinforcement learning (RL) or Neuro-dynamic programming (NDP) in computer sciences and artificial intelligence. Although different areas take different perspectives and have different formulations for structures of these systems, they share the common goal: to make the best decision to optimize the system performance.Recently, a performance optimization method from a sensitivity point view can explain and unify the above methods in different areas. The potential theory is the basis of this method. By using two types of performance sensitivity formulas, one for performance derivatives and the other for performance differences, existing results in the above different areas and their relations can be derived or explained in a simple and intuitive way. This method can not only obtain the optimal policy by solving the theoretical values, but also improve the system performance on line based on one sample path even if the parameters of systems are unknown. Thus, this method can solve the problems about the curse of dimensionality and the curse of modeling in some sense. However, this method mainly studies the Markov systems and rarely studies the non-Markov systems. In this paper, on the basis of this method, we mainly focus on the studies of performance sensitivity analysis and performance optimization for Semi-Markov decision processes (SMDPs) and partially observable Markov decision processes (POMDPs). SMDPs and POMDPs are two different extensions of MDPs. The sojourn time of SMDPs is not exponentially distributed, but generally distributed. The states of POMDP cannot be observed directly, but an observation associated with the state can be drawn according to a probability distribution. These characters make the description of practical systems more reasonable. Thus, the results and the algorithms in this paper can apply to the optimizations of more practical systems.In this paper, by using equivalent Markov decision processes, two infinitesimal matrixes are introduced for SMDPs under the average performance criterion and the discounted performance criteria. Then, the potentials of SMDPs are defined by using these matrixes and the performance difference formula and the performance derivative formula...
Keywords/Search Tags:Semi-Markov decision processes(SMDPs), Partially observable Markov decision processes(POMDPs), Performance optimization, Performance sensitivity analysis, Event-based optimization, switching processes
PDF Full Text Request
Related items