Font Size: a A A

Recursive Least-squares Reinforcement Learning Based On An Improved Extreme Learning Machine

Posted on:2018-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:B M HuangFull Text:PDF
GTID:2348330518492936Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Policy evaluation and policy optimization are two major issues of reinforcement learning.The most popular policy evaluation,called predictive learning process in the area of machine learning,is temporal difference reinforcement learning algorithm.And it provides a strong foundation to the most popular Q policy optimization algorithm for solving the problem of learning control.In this paper,we will do the research of policy evaluation algorithm and policy optimization algorithm.On the one hand,in the matter of policy evaluation algorithm,according to the requirements on accuracy and computation rate of value approximation algorithms,the recursive method is introduced into the least-squares temporal difference reinforcement learning algorithm,in order to eliminate the matrix inversion process in the least-squares algorithm so that reduce the complexity of the proposed algorithm.On the other hand,this paper replaced sigmoid activation function with a single suppressed approximation called Softplus function in the extreme learning machine network because the value functions in most reinforcement learning problem is monotonic,then regularization factor is introduced to the algorithm to reduce the function space effectively and avoid over-fitting problems.The experiment on generalized Hop-world and inverted pendulum problems demonstrate that the proposed algorithm can improve learning rate and stability compared with least-squares temporal difference learning algorithm based on extreme learning machine,and improve accuracy compared with least-squares temporal difference learning algorithm based on radial basis functions.In the matter of policy optimization,we combine the improved IELM-LSTD policy evaluation algorithm with Q learning to solve the best-path finding problem in a role-playing game.
Keywords/Search Tags:reinforcement learning, policy evaluation, policy optimization, recursive least-square temporal difference learning, extreme learning machine
PDF Full Text Request
Related items