Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials

Posted on:2008-04-21

Degree:Master

Type:Thesis

Country:China

Candidate:B Li

Full Text:PDF

GTID:2178360215951638

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of science and technology, a kind of stochastic artificial system, which is discrete event dynamic system (DEDS), broadly exists in the real world. Markov decision process (MDP) is a major model for these systems, and performance optimization of MDP is always the research focus. The concept of Markov performance potentials provides a new theoretical frame for solving MDP. Based on performance potentials, it is easy to establish theoretical algorithm to solve the MDP system with known parameters. For the system with unknown parameters, we first estimate performance potentials in a sample path, and use offline or online simulation algorithm to obtain an approximate solution.As the natural characteristic of practical large-scale MDP systems, these systems often contain enormous dispersed state space. At the moment, serial computing of optimization methods will be much time-consuming, and even it is unfeasible to gain a solution. Parallel computing appears a good way to improve the efficiency of optimization algorithms. First, parallel computing can really reduce the execution time. Second, parallel simulation methods not only accelerate the convergence speed, but also may be gain more accurate solution. So parallel computing play a strong guiding role in the realization of large-scale MDP systems optimization. Therefore, this thesis mainly research on the parallel algorithms and its application for MDP. In the paper, we argue parallel implementation of theoretical and simulation optimization algorithms for MDP base on the performance potentials.First, we propose the parallelizing value iterative algorithm which divides global state space into several state subset and parallel process optimization algorithms of the state subset in cluster. As the method of traditional stochastic partition state space isn't suitable for large-scale states, we give a heuristic partition strategy and analysis its parallel performance. The heuristic strategy use the waiting time to be a target function, and partition the state space by minimizing the waiting time for all the processors. An experimental result shows that the heuristic partition algorithm can achieve higher speedup than stochastic partition algorithm.We also discuss parallel implementation method of Q-learning and NDP. First, we propose an independent parallel Q-learning algorithm (IPQ) and a state partition parallel Q-leaming algorithm (SPPQ) base on performance potentials, where we mainly discussed the synchronization strategy, that is, how to choose synchronization point, and the building strategy of Q values, that is, how to construct new Q-factors with some of the derived Q-factors. A synchronization strategy is provided by combining fixed step with offset step. In addition, the principle for establishing building strategy is analyzed, and then some methods are provided for obtaining building strategy. Meantime, we present two parallel algorithms of NDP methods based on some neural networks ensemble. For increasing ensemble diversity to improve the generalization ability of learning system, three methods is given in order to create diverse neural networks. Moreover, this thesis gives numerical simulation examples to use three methods and compares the results.Parallelizing rollout algorithms is also proposed. The rollout algorithm has a very strong inherent parallelism, we propose two methods for parallelizing this algorithm to reduce the computation time, and we also analyze their performance. We also derive an extension of the rollout algorithm that is applied to multi-product inventory control and multi-level inventory control.

Keywords/Search Tags:

Markov decision process, performance potential, parallel computing, reinforcement learning, neuro-dynamic programming, Rollout algorithm, neural network ensemble

PDF Full Text Request

Related items

1	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
2	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
3	NDP Optimization For Large-scale Markov Systems Based On Performance Potentials-learning
4	Look-ahead Control For CSPS Model Based On Learning
5	Continuous Time Hierarchical Reinforcement Learning Algorithm
6	Modeling And Optimization For Software-Defined Multimedia Service
7	Reinforcement Learning And Its Applications In Navigation And Control Of Mobile Robots
8	A Deep Reinforcement Learning Based Dynamic Recommender System
9	Study On The Learning And Planning Algorithm Of Intelligent Agent Based On Performance Potentials
10	Inverse Reinforcement Learning Algorithms In Semi-markov Environment