Font Size: a A A

Research On The Reinforcement Learning Method And Its Application

Posted on:2008-11-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Q HuangFull Text:PDF
GTID:1118360242475998Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important machine learning method. Reinforcement learning could learn the optimal policy of the dynamic system through environment state observation and improve its behavior through trial and error with the environment. Reinforcement learning has the quality of low requirement for the prior knowledge and is also a kind of online learning method for the real-time environment, which is extensively researched in the field of intelligent control and machine learning.The aim of reinforcement learning is to learn the mapping from the state space to the action space and it approximates the"state-action"mapping using the parameter function in essence. The conventional reinforcement method such as Q-learning, TD learning or Sarsa learning has a common characteristic of estimating the value function only and action selection is determined by the value function estimation completely. The generalization method for the value function and policy space approximation basically adopts the Adaptive Heuristic Critic proposed by Barto. This method would divide the continuous state space into fixed amount sub space and can result the problem of combinational state explodes, namely"curse of dimensionality"during processing the continuous state space problem. It is required to use the quantity method to discrete the continuous input space so as to reduce the complexity of input space. This thesis adopts the normalized radial basis function (NRBF) as the local function approximator to represent the state space and proposed an adaptive state space construction strategy (ASC-NRBF) based on NRBF.Fuzzy control system is simple and is extensively used in all areas in recent years. How to obtain the good fuzzy rules and membership is a key issue in the process of fuzzy controller design. Generally, the rule base and membership is acquired through experience and this could not make the control effect optimization. Genetic algorithm has arisen many researchers attention, as it is a kind of global optimal algorithm. This thesis proposed a Hierarchical GA fuzzy reinforcement learning (HGAFRL) system and it could adjust the fuzzy rule and membership adaptively, which improved the learning efficiency of the system.An agent simulates a human being,while a multi-agent system simulates the human society. As learning, communication and collaboration are essential characteristics of human beings, so it is of great importance to perform researches on Distributed Reinforcement Learning (DRL) in multi-agent systems. Nevertheless,existing DRL algorithms suffer from the hardness of Structural Credit Assignment (SCA), the slowness of learning rate, and other problems, thus their application fields are strongly restricted. In this thesis, in-depth researches on DRL theory are made and the primary solutions of some existing problems are presented.The main contributions and achievements of this dissertation are given below:(1) In order to solve the combinational explodes problems in a continuous and high dimensional state space, a kind of adaptive state space construction strategy based on the NRBF (ASC-NRBF) was proposed. The normalized radial basis function (NRBF) was adopted as the local function approximator and combines it with AHC reinforcement learning, which enables the system to allocate appropriate number and size of the basis functions automatically. Compare to the conventional state space construction method, the proposed method could gain a high performance with little RBF and has the merit of rapid learning, high stability and strong robust.(2) A hierarchical GA fuzzy reinforcement learning (HGAFRL) system was proposed and it is a reinforcement learning system based on the Actor-Critic. This reinforcement learning system consists of adaptive evaluation network (AEN), action selection network (ASN) and stochastic action modification (SAM). The action selection network uses the hierarchical GA fuzzy adaptive controller and it can delete the redundant fuzzy sets and rule using control gene, and then add the membership flexibility and optimize the fuzzy adaptive control network structure and parameter.(3) An improved distributed Q-learning algorithm was presented for the multi-agent system (MAS). During the learning process, the agent could learn the other agents'policy and acquire the environment influence to establish its reward and state subsequence function through the behavior observation and statistic of other agents. The improved distributed Q-learning could ensure the selection for the optimal joint-action and algorithm convergence in theory by using the behavior probability estimation and joint-action statistic.(4) A reinforcement learning method to obtain the environment model that is independent of the task was given for the multi-agent system. By using the environment model, the learning time can be decreased. Considering the distributed multi-agent system, the environment model is constructed quickly by sharing of the experience of each agent. The gird world simulation results showed the validity and convergence of the algorithm.
Keywords/Search Tags:Reinforcement learning (RL), normalized radial basis function (NRBF), function approximation, fuzzy control, hierarchical GA, neural network, multi-agent, distributed Q-learning, joint-action
PDF Full Text Request
Related items