Research On Approximate Reinforcement Learning With Kernel Methods

Posted on:2018-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:H J Zhu

Full Text:PDF

GTID:2348330542965278

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement Learning(RL)is a machine-learning framework for solving sequential decision problems by interacting with environment continuously and maximizing the reward signal.RL can learn data online without any label and environment knowledge.However,it suffers from slow convergence speed and low calculation precision.What is worse,it is difficult for RL to deal with continuous action space tasks.Kernel method,as an efficient non-parameter approximator,can help RL with faster converging and getting results that are more accurate.Policy gradient methods,as an important branch of policy search,are competent to handle with the continuous action space tasks.This paper focuses on solving the model-free sequential decision problems that with continuous state and action spaces.By integrating with the kernel method and policy gradient method,we propose some function approximation RL algorithms.i.For kernel-based RL,the selection of the sparsification methods and kernel functions affects the performance of algorithms.Traditional sparsification methods usually bring the low estimation precision and long execution time.For this,we propose a new clusteringbased sample sparsification method(CNC).There are the preparation and learning phases in CNC.During the period of preparation,the clustering-based method finds the distribution of the samples that can improve the estimation accuracy of CNC.The low time complexity of novelty criterion is adopted in online learning process which can satisfy the time requirement of online learning.Based on Sarsa(?),we combine CNC with selective kernel function(CNC-SK)to represent value function approximately and propose the clustering-based selective kernel Sarsa(?)algorithm(CSKS(?)).Finally,we illustrate its control properties by a control problem with continuous state space.ii.The policy evaluation plays an important role in the policy search methods.True online temporal difference algorithm(TOTD(?))is an efficient policy evaluation algorithm.We combine TOTD(?)with CNC-SK to propose the kernel-based TOTD(?)algorithm(TOKTD(?)).We illustrate its policy evaluation properties by a control problem with continuous state space.Then we apply TOKTD(?)to evaluate policy in the policy search methods.Finally,we illustrate that TOKTD(?)can improve the efficiency of policy search methods by a continuous state and action spaces problem.iii.We improve the calculation of natural gradient method with the idea of TOTD(?).Then combining it with the above two points,we propose a kernel-based true online natural gradient actor-critic algorithm(TOKNAC),which can cope with the continuous space problems without any knowledge about the environment.TOKNAC applies CNC-SK to represent value function and policy function approximately.In TOKNAC,the critic part evaluates the policy by TOKTD(?),and the actor part computes the natural gradient in the view of TOTD(?).Finally,we illustrate the control efficiency of TOKNAC by continuous state and action spaces control problems.

Keywords/Search Tags:

reinforcement learning, kernel method, natural gradient, continuous space, clustering

PDF Full Text Request

Related items

1	Research On Reinforcement Learning In Continuous Spaces
2	Research On Fast Policy Gradient Algorithms Of Reinforcement Learning Based On Adaptive Learning Rate
3	Research On Continuous Robot Control Algorithms Based On Reinforcement Learning
4	Neural Network-Based Research On Reinforcement Learning In Continuous State Space
5	Research On Multiagent Reinforcement Learning Algorithm In Continuous Action Space
6	Deep Reinforcement Learning in Natural Language Scenario
7	Reaearch On Deep Reinforcement Learning Algorithm In Continuous Action On Space
8	Research On Deterministic Policy Gradient Algorithms With Continuous Control Task
9	Encoding Robot Topology Information For Deep Reinforcement Learning With Continuous Action Space
10	Research On Security Deep Reinforcement Learning Based On Experiences