Font Size: a A A

Agent Sequential Decision-making Approach And Its Application Under Uncertain Enviroment

Posted on:2014-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WuFull Text:PDF
GTID:1268330401456220Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The problem of online planning and learning under uncertainty has received significant attention in the scientific community over the past few years. It is now well-recognized that considering uncertainty during planning and decision-making is imperative to the design of robust computer systems. The partially observable Markov decision processes (POMDPs) provide a rich framework to model a wide range of sequential decision making problems under uncertainty. The POMDPs model can optimize sequences of policies which are robust to sensor noise, missing information, as well as partially observable information.However, the online planning and learning in partially observable Markov decision processes are often intractable due to belief states space has two curses:dimensionality and history. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with small scale. In order to solving these problems, this dissertation addresses some issues about belief states space compression, online planning, online learning, as well as the application in wireless sensors network. The main results and innovations are listed as follows:(1) An novel algorithm for belief states space compression using NMF update rules in POMDPs.For solving planning in POMDPs over belief states space is the curse of dimensionality, this paper presents a novel approach to compress belief states space using non-negative matrix factorization update rules, which reduces high dimensional belief states space by two steps. Firstly, this algorithm adopts factored representations of states, observations and actions by exploiting the structure of factored POMDPs, then decomposes and compresses transition functions by exploiting conditional independence and context-specific independence of DBNs, and then removes the zero probability to lower the sparsity of belief states space. Secondly, it adopts value-directed compression approach to make the approximate belief states induce decisions that are as close to the optimal one as possible, and exploits NMF update rules instead of Krylov iterations to speed up reducing the high dimensions of belief states space. This algorithm not only guarantees the value function and reward function of the belief states unchanged after reducing dimensions, but also keeps the piecewise linear and convex property to compute the optimal policy by using dynamic programming. Simulation results demonstrate that the proposed belief compression algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies.(2) Point-based online value iteration algorithm for POMDPs.In order to address the curse of history for sequential decision-making in POMDPs, this paper proposes a point-based online value iteration (PBOVI) algorithm for POMDPs. This algorithm for speeding up POMDPs solving involves performing value backup at specific reachable belief points, rather than over the entire belief simplex. The paper exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online, and proposes a novel idea to reuse the belief states that have been computed last time to avoid repeated computation. Simulation results show that the proposed algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies, so it can meet the requirement of a real-time system.(3) Model-based Bayesian reinforcement learning in factored partially observable Markov decision processes.Due to the enormous number of parameters and slow convergence in model-based Bayesian reinforcement learning, the paper presents a novel model-based Bayesian reinforcement learning in factored partially observable Markov decision processes. Firstly, factored representations are made to represent the dynamics with few parameters. Then, according to prior knowledge and observable data, this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal exploration-exploitation tradeoff. Finally, a novel point based incremental pruning algorithm is presented to speed up the convergence. Theoretical and numerical results show that the discrete POMDPs approximate the underlying Bayesian reinforcement learning task well with guaranteed performance.(4) POMDPs-based energy-efficient solutions in wireless sensor networks.Energy-efficient solution is a challenging problem in wireless sensor networks (WSNs). To conquer the problem, this paper addresses some energy-efficient algorithms using the above proposed POMDPs approaches. First of all, this paper proposed a novel algorithm using generalized inverse nonnegative matrix factorization for energy-efficient communication in WSNs. The algorithm adopts nonnegative matrix factorization approach to reduce dimensions of the matrix decomposed by SVD into lower dimensions, and uses multiplication update law quickly acquire the final dimension reduction results. Then, another novel algorithm using belief reusing for energy-efficient tracking in sensor networks is proposed. It exploits the maximum rewards heuristic search algorithm to approximate the optimal value with respect to tracking performance, and presents belief reusing approach to avoid repeatedly acquiring belief states, which can effectively reduce sensors energy consumption during communication. The numerical results show that the proposed algorithms have their effectiveness in optimizing the tradeoff between tracking performance and energy consumption, so they can meet the requirement of high tracking performance with low energy consumption. There are41figures,11tables, and172references.
Keywords/Search Tags:partially observable Markov decision processes, belief statesspace, point-based online value iteration, Bayesian reinforcementlearning, wireless sensors network
PDF Full Text Request
Related items