Font Size: a A A

Aggregation and multi-mode switching control in Markov decision processes

Posted on:2003-08-13Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Ren, ZhiyuanFull Text:PDF
GTID:1468390011482078Subject:Engineering
Abstract/Summary:
This dissertation consists of two major research themes for Markov decision processes (MDPs): aggregation and multi-mode switching.; There are two types of aggregation, time aggregation and state aggregation. From a control point of view, time aggregation reduces the control effort by applying the same control at control epochs within the same period of time; state aggregation reduces the control effort by applying the same control for states in the same state subclass. We provide results on both time aggregation and state aggregation for solving large MDP problems.; To model nonstationary system behavior, we introduce the concept of multi-mode MDPs. The system is modeled as an MDP whose transition and cost parameters are functions of a mode variable. The mode evolves stochastically and is modeled as a controlled Markov chain whose dynamics may be affected by the system evolution. The whole system-andmode model is an MDP in which each state has two variables, the system state and the mode, but the “curse of dimensionality” makes it difficult if not impossible to obtain an exact solution to the whole problem.; Our approach to controlling multi-mode systems is to introduce switching, or supervisory, control. The idea is to design a collection of controllers off-line, each of which controls the system reasonably well under a certain situation (i.e., mode), and then to switch among these controllers online to accommodate real situations. For multi-mode MDPs, we study three switching control schemes based on mode matching, time aggregation and state aggregation, respectively, which can achieve near-optimum performance. We also apply the results to a problem of dynamic power management in computing systems.; In addition to aggregation and multi-mode switching, we also provide new results for general MDPs including: (i) an adaptive control scheme for average-cost MDP problems from Q-learning; and (ii) a formulation and solution to fractional-cost MDP problems.
Keywords/Search Tags:Aggregation, MDP, Markov, Mdps
Related items