Font Size: a A A

Simulation-based algorithms for Markov decision processes

Posted on:2003-11-14Degree:Ph.DType:Dissertation
University:University of Maryland College ParkCandidate:He, YingFull Text:PDF
GTID:1468390011479518Subject:Engineering
Abstract/Summary:
Problems of sequential decision making under uncertainty are common in manufacturing, computer and communication systems, and many such problems can be formulated as Markov Decision Processes (MDPs). Motivated by a capacity expansion and allocation problem in semiconductor manufacturing, we formulate a fab-level decision making problem using a finite-horizon transient MDP model that can integrate life cycle dynamics of the fab and provide a trade-off between immediate and future benefits and costs.; However, for large and complicated systems formulated as MDPs, the classical methodology to compute optimal policies, dynamic programming, suffers from the so-called “curse of dimensionality” (computational requirement increases exponentially with number of states/controls) and “curse of modeling” (an explicit model for the cost structure and/or the transition probabilities is not available).; In problem settings to which our approaches apply, instead of the explicit transition probabilities, outputs are available from either a simulation model or from the actual system. Our methodology is first to find the structure of optimal policies for some special cases, and then to use the structure to construct parameterized heuristic policies for more general cases and implement simulation-based algorithms to determine parameters of the heuristic policies. For the fab-level decision-making problem, we analyze the structure of the optimal policy for a special “one-machine, two-product” case, and discuss the applicability of simulation-based algorithms.; We develop several simulation-based algorithms for MDPs to overcome the difficulties of “curse of dimensionality” and “curse of modeling”, considering both theoretical and practical issues. First, we develop a simulation-based policy iteration algorithm for average cost problems under a unichain assumption, relaxing the common recurrent state assumption. Second, for weighted cost problems, we develop a new two-timescale simulation-based gradient algorithms based on perturbation analysis, provide a theoretical convergence proof, and compare it with two recently proposed simulation-based gradient algorithms. Third, we propose two new Simultaneous Perturbation Stochastic Approximation (SPSA) algorithms for weighted cost problems and verify their effectiveness via simulation; then, we consider a general SPSA algorithm for function minimization and show its convergence under a weaker assumption: the function does not have to be differentiable.
Keywords/Search Tags:Simulation-based algorithms, Decision, Problem
Related items