Font Size: a A A

Research On Approximate Reinforcement Learning With Kernel Methods

Posted on:2018-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhuFull Text:PDF
GTID:2348330542965278Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL)is a machine-learning framework for solving sequential decision problems by interacting with environment continuously and maximizing the reward signal.RL can learn data online without any label and environment knowledge.However,it suffers from slow convergence speed and low calculation precision.What is worse,it is difficult for RL to deal with continuous action space tasks.Kernel method,as an efficient non-parameter approximator,can help RL with faster converging and getting results that are more accurate.Policy gradient methods,as an important branch of policy search,are competent to handle with the continuous action space tasks.This paper focuses on solving the model-free sequential decision problems that with continuous state and action spaces.By integrating with the kernel method and policy gradient method,we propose some function approximation RL algorithms.i.For kernel-based RL,the selection of the sparsification methods and kernel functions affects the performance of algorithms.Traditional sparsification methods usually bring the low estimation precision and long execution time.For this,we propose a new clusteringbased sample sparsification method(CNC).There are the preparation and learning phases in CNC.During the period of preparation,the clustering-based method finds the distribution of the samples that can improve the estimation accuracy of CNC.The low time complexity of novelty criterion is adopted in online learning process which can satisfy the time requirement of online learning.Based on Sarsa(?),we combine CNC with selective kernel function(CNC-SK)to represent value function approximately and propose the clustering-based selective kernel Sarsa(?)algorithm(CSKS(?)).Finally,we illustrate its control properties by a control problem with continuous state space.ii.The policy evaluation plays an important role in the policy search methods.True online temporal difference algorithm(TOTD(?))is an efficient policy evaluation algorithm.We combine TOTD(?)with CNC-SK to propose the kernel-based TOTD(?)algorithm(TOKTD(?)).We illustrate its policy evaluation properties by a control problem with continuous state space.Then we apply TOKTD(?)to evaluate policy in the policy search methods.Finally,we illustrate that TOKTD(?)can improve the efficiency of policy search methods by a continuous state and action spaces problem.iii.We improve the calculation of natural gradient method with the idea of TOTD(?).Then combining it with the above two points,we propose a kernel-based true online natural gradient actor-critic algorithm(TOKNAC),which can cope with the continuous space problems without any knowledge about the environment.TOKNAC applies CNC-SK to represent value function and policy function approximately.In TOKNAC,the critic part evaluates the policy by TOKTD(?),and the actor part computes the natural gradient in the view of TOTD(?).Finally,we illustrate the control efficiency of TOKNAC by continuous state and action spaces control problems.
Keywords/Search Tags:reinforcement learning, kernel method, natural gradient, continuous space, clustering
PDF Full Text Request
Related items