On-line neurodynamic programming: From concepts to configurations and evaluations

Posted on:2000-01-02

Degree:Ph.D

Type:Dissertation

University:Arizona State University

Candidate:Wang, Yu-tsung

Full Text:PDF

GTID:1468390014462700

Subject:Engineering

Abstract/Summary:

This dissertation discusses some basic notions of on-line learning control systems in the general framework of neuro-dynamic programming (NDP). The objective of the learning controller is to optimize a certain performance measure by learning to create appropriate control actions through interacting with the underlying environment. The controller is designed to learn to perform better over time despite no prior knowledge about the system. On the other hand, the system under consideration may not render a complete system model. Instead, on-line sampled measurements from the system are available. The feedback from the environment about the system is less descriptive in the sense that only indicative signals, such as binary-valued signals signifying either a success or a failure, are available at the end of a task. This dissertation introduces several configurations for implementing the NDP. Their performances will be evaluated. In all of the proposed implementations, the state measurements are the inputs to the NDP-Control actions are then generated according to the (processed) state measurements. A critic network serves the purpose of 'monitoring' the performance of the controller to achieve a given optimality. In this dissertation, detailed performance evaluations of this learning controller are provided for a single cart-pole problem and a triple-link inverted pendulum problem. The main contribution of the dissertation contains the following elements. (1) It provides systematic approaches on implementations of reinforcement learning systems through computer models. The consistency and reliability of these implementations are evidenced by systematic performance evaluations and some analytic guidelines. (2) The implementations of the NDP design developed in the dissertation are potentially scalable to large size problems. The NDP design circumvents the problem of 'curse of dimensionality' even further by applying the concept of 'topology preserving' in the initial learning stage. (3) Learnings took place at different levels of the proposed reinforcement learning system are 'on-line'. This is truly a learning 'on the fly' system. It represents an advancement from the existing algorithms in mimicking the human decision making process.

Keywords/Search Tags:

System, On-line, NDP, Dissertation

Related items

1	Dalian University Of Technology Master, Bo Dissertation Database System,
2	Learning as a nonlinear line of attraction for pattern association, classification and recognition
3	Desigs And Implementation Of Dissertation Management System
4	The Development And Application Of University Dissertation Management System
5	Design And Realization Of The Online Dissertation Assessment System Based On .NET
6	Metamaterial-based transmission line components and antennas
7	Theory and application of point-line kinematics
8	Web-network Teaching Platform Construction And Realization
9	The Design And Achievement Of The Guiding System On The Design Project For Graduation (Dissertation) Based On CBD
10	The Research And Design Of Support Service System For Diploma Project (Dissertation) Based On Internet