Performance Potential-based NDP Optimization Approaches And Application Research For SMDP

Posted on:2007-01-23

Degree:Master

Type:Thesis

Country:China

Candidate:D Chen

Full Text:PDF

GTID:2178360182986281

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Discrete event dynamic systems (DEDSs) are generally existing man-made systems in the real world, and semi-Markov decision process (SMDP) is one of the major models for such systems. To meet with the optimizing control requirements of large state DEDS, we focus on the application of neuro-dynamic programming(NDP) to the SMDP performance optimization based on reinforcement learning (RL).Performance potential plays an important role in the performance analysis and research of SMDP. Based on performance potential theorem and Bellman optimality equation, it is easy to establish optimality equation, which we call performance potential-based Bellman optimality equation, for both average-cost and discounted-cost performance criteria. We can use theory methods, such as value iteration and policy iteration, to solve the equation, and the system obtains the control information using look-up table method. Considering that modern DEDS is generally complex system, when we use theory method to solve the equation, memory storage will save much information which consequently occupys much space, and the memory even overflows under the condition that large invertible computations occur. And so the" curse of dimensionality" problem may arise, which lead to the problems insoluble. We may consider simulation-based method as an alternative one to solve the former problems. NDP methods based on RL are effective ones that can work. Such methods use some function approximation or neural architecture to represent performance function or policy, and require comparatively small space to save less parameter information.There are three methods in NDP corresponding to critic, actor and actor-critic, respectively. We centers on actor and actor-critic methods in the paper. In both methods, we transform SMDP to equivalent MDP, and then to a uniformized equivalent chain so as to perform performance optimization using uniformized equivalent chain optimizing theories. In actor method, we use neural network to approximate policy. Based on simple sample path of uniformized chain, we first perform performance potential learning which result is used as the approximat policy evaluation of the actor algorithms;then, based on the estimates of performance potential, we make the approximate policy updating , and improve the network parameters, hence the policy. In network training, we provide two parameter-improving methods: steepest descent method and sample training method. In actor-critic method, we use two neural networks to separately represent potential and policy. Policy evaluation and policy updating are similar to those in actor method, only differ in the aspect that we perform potential parameterized learning using steepest method based on simple sample path simulation. Meanwhile, we present a unified TD(λ) learning formula for both discounted and average criteria in two methods. In addition, a numerical example is provided to demonstrate the effectiveness of the corresponding algorithms in two methods.Call admission control (CAC) problem is very common in tele-communication networks, and the CAC problem for a single link can be formulated as a continues-time Markov decision problem. Similarly, the large state space in CAC system may lead to curse of dimensionality. We apply NDP methods to CAC, and study the former three methods in the optimization control of CAC. In the three methods, performance potential learning or parameterized learning is similar to those in chapter 3 and chapter 4. And in actor and actor-critic methods, the policy that the network approximates is produced randomly, that is, the policy is random policy. Thesimulation results show the effectiveness of the NDP methods in CAC.

Keywords/Search Tags:

semi-Markov decision process, performance potential, uniformized chain, reinforcement learning, neuro-dynamic programming

PDF Full Text Request

Related items

1	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
2	Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials
3	NDP Optimization For Large-scale Markov Systems Based On Performance Potentials-learning
4	Continuous Time Hierarchical Reinforcement Learning Algorithm
5	Inverse Reinforcement Learning Algorithms In Semi-markov Environment
6	Continuous-Time Unified MAXQ Algorithm And Its Application
7	Robust Control For Uncertain Semi-Markov Decision Processes Based On Performance Potentials
8	Look-ahead Control For CSPS Model Based On Learning
9	Modeling And Optimization For Software-Defined Multimedia Service
10	Reinforcement Learning And Its Applications In Navigation And Control Of Mobile Robots