Neural Network-Based Research On Reinforcement Learning In Continuous State Space

Posted on:2017-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:T T Wang

Full Text:PDF

GTID:2308330509955310

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning has been used in the field of artificial intelligence,including industrial production, elevator scheduling, and path planning, which can be used to solve the problems of stochastic or uncertain dynamic systems.And because most of these problems are in the continuous state space, so the classical reinforcement learning algorithm based on look-up table method is not effective.But the generalization ability and abstraction ability of neural network can be added into the reinforcement learning, to solve the problem.This paper focuses on the possible problem that continuous state space may bring, using neural network containing the eligibility traces to approximate the value function, based on the existing reinforcement learning algorithm, and puts forward two new neural network based reinforcement learning algorithms in the continuous state space, in order to enhance the abstraction and generalization ability of reinforcement learning in the face of continuous state space, and to improve the application ability of reinforcement learning in the actual production and life in the future. The main research contents are as follows:In the face of continuous state space, we study a kind of reinforcement learning algorithm combined RBF neural network which is introduced into the eligibility traces.This thesis use the generalization ability of RBF neural network, to solve the function approximation problems in continuous state space. At the same time, the reinforcement learning algorithm introduced eligibility traces in the neural network,so that the output of each node only affects the directly related weights, and preserved the effect. It is equivalent to updaing all Q values of visited state-action pair in each iteration.The algorithm can not only solve the problem of continuous state space, and speed up the convergence rate of the task. Finally, through the Mountain car platform, the performance of the algorithm is verified.In the process of the above research, eligibility traces can improve the updating efficiency of weight in the neural network, and we continue to study the reinforcement learning algorithm combined the hybrid neural network(ELM and BP).The algorithm uses the generalization ability of neural network and actor-critic architecture,in order to solve the problem of continuous state space.In this algorithm.,BP network act as action network mapping the state as the actual action potential,and ELM network act as evaluation network(critic), which is used toapproximate the value funciton and output the evaluation strategy. Using the sliding time-window mechanism, to a certain extent, can reduce the size of the sample space,and introducing eligibility mechanism, which can accelerate the network weight updating speed. After training and learning, the algorithm can effectively solve the problems in continuous state spaces, and has a good convergence rate. Finally, the inverted pendulum simulation experiments verify the feasibility of the algorithm.

Keywords/Search Tags:

reinforcement learning, continuous state space, neuralnetwork, eligibility traces

PDF Full Text Request

Related items

1	Research On Parallel Reinforcement Learning
2	Research On Reinforcement Learning Oriented Model Learning Algorithms
3	Encoding Robot Topology Information For Deep Reinforcement Learning With Continuous Action Space
4	Research On Reinforcement Learning In Continuous Spaces
5	Reinforcement Learning For Fuzzy Multi-objective Cloud Resource Scheduling Problem
6	On Reinforcement Learning Control For Bionic Underwater Robots
7	Research On Multiagent Reinforcement Learning Algorithm In Continuous Action Space
8	Improving The Generalization Of Reinforcement Learning In Continuous Control Via State Instability Regularizer
9	Hierarchical reinforcement learning in continuous state and multi-agent environments
10	Learning state and action space hierarchies for reinforcement learning using action -dependent partitioning