Font Size: a A A

On-line neurodynamic programming: From concepts to configurations and evaluations

Posted on:2000-01-02Degree:Ph.DType:Dissertation
University:Arizona State UniversityCandidate:Wang, Yu-tsungFull Text:PDF
GTID:1468390014462700Subject:Engineering
Abstract/Summary:
This dissertation discusses some basic notions of on-line learning control systems in the general framework of neuro-dynamic programming (NDP). The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the underlying environment. The controller is designed to learn to perform better over time despite no prior knowledge about the system. On the other hand, the system under consideration may not render a complete system model. Instead, on-line sampled measurements from the system are available. The feedback from the environment about the system is less descriptive in the sense that only indicative signals, such as binary-valued signals signifying either a success or a failure, are available at the end of a task. This dissertation introduces several configurations for implementing the NDP. Their performances will be evaluated. In all of the proposed implementations, the state measurements are the inputs to the NDP-Control actions are then generated according to the (processed) state measurements. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. In this dissertation, detailed performance evaluations of this learning controller are provided for a single cart-pole problem and a triple-link inverted pendulum problem. The main contribution of the dissertation contains the following elements. (1) It provides systematic approaches on implementations of reinforcement learning systems through computer models. The consistency and reliability of these implementations are evidenced by systematic performance evaluations and some analytic guidelines. (2) The implementations of the NDP design developed in the dissertation are potentially scalable to large size problems. The NDP design circumvents the problem of 'curse of dimensionality' even further by applying the concept of 'topology preserving' in the initial learning stage. (3) Learnings took place at different levels of the proposed reinforcement learning system are 'on-line'. This is truly a learning 'on the fly' system. It represents an advancement from the existing algorithms in mimicking the human decision making process.
Keywords/Search Tags:System, On-line, NDP, Dissertation
Related items