Recursive Least-squares Reinforcement Learning Based On An Improved Extreme Learning Machine

Posted on:2018-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:B M Huang

Full Text:PDF

GTID:2348330518492936

Subject:Control Science and Engineering

Abstract/Summary:

Policy evaluation and policy optimization are two major issues of reinforcement learning.The most popular policy evaluation,called predictive learning process in the area of machine learning,is temporal difference reinforcement learning algorithm.And it provides a strong foundation to the most popular Q policy optimization algorithm for solving the problem of learning control.In this paper,we will do the research of policy evaluation algorithm and policy optimization algorithm.On the one hand,in the matter of policy evaluation algorithm,according to the requirements on accuracy and computation rate of value approximation algorithms,the recursive method is introduced into the least-squares temporal difference reinforcement learning algorithm,in order to eliminate the matrix inversion process in the least-squares algorithm so that reduce the complexity of the proposed algorithm.On the other hand,this paper replaced sigmoid activation function with a single suppressed approximation called Softplus function in the extreme learning machine network because the value functions in most reinforcement learning problem is monotonic,then regularization factor is introduced to the algorithm to reduce the function space effectively and avoid over-fitting problems.The experiment on generalized Hop-world and inverted pendulum problems demonstrate that the proposed algorithm can improve learning rate and stability compared with least-squares temporal difference learning algorithm based on extreme learning machine,and improve accuracy compared with least-squares temporal difference learning algorithm based on radial basis functions.In the matter of policy optimization,we combine the improved IELM-LSTD policy evaluation algorithm with Q learning to solve the best-path finding problem in a role-playing game.

Keywords/Search Tags:

reinforcement learning, policy evaluation, policy optimization, recursive least-square temporal difference learning, extreme learning machine

Related items

1	Identification And Internal Model Control Of Fractional Order Systems
2	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
3	Research On Application Of Reinforcement Learning In Swing-up And Balance Control Of Inverted Penduum
4	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning
5	Research On Policy-Constrained Reinforcement Learning
6	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
7	Research On Execution-time Policy Evaluation And Policy Evolution In Open Environments
8	Research On Reinforcement Learning Methods Based On Direct Policy Search
9	Research On Multiagent Policy Optimization Based On Deep Reinforcement Learning
10	Research On Accelerating The Convergence Of Off-policy Temporal Difference Learning