Research On Temporal Difference Algorithm Based On Kernel Function Approximation

Posted on:2018-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:C J Sun

Full Text:PDF

GTID:2348330542965192

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning is an important branch of machine learning.It can interact with the environment continuously to get feedback signal and implement the optimization of the policy.At present,reinforcement learning knowledge is more widely applied to job scheduling,path planning,online learning control,game and other practical areas.However,at this stage using reinforcement learning knowledge to solve the practical large-scale or continuous space tasks still faces some difficulties:i.the problem of the balance between exploration and exploitation;ii.the "dimension disaster" problem;iii.the time reliability distribution problem etc.Temporal difference algorithm is an effective method to solve the problem of time reliability distribution.The general method of solving the large-scale or continuous space reinforcement learning tasks is function approximation.Kernel function approximation is a typical nonparametric approximation,which has better generalization performance than the parametric approximation.However,its computational complexity increases with the increase of the samples.This paper researches on the kernel-based temporal difference algorithm and discusses the performance of kernel-based reinforcement learning algorithm.The main research work is as follows:i.Research on kernel least squares temporal difference algorithm based on approximate samples.In order to make the balance of exploration and exploit in large-scale or continuous space practical reinforcement learning tasks and avoid the "curse of dimensionality" problem.Study least squares time difference algorithm based on kernel function approximation.Use subset of data approximate method based on the criterion of maximum variance to deal with the samples.Reach the goal of reducing the samples and improving the efficiency of approximation.The experimental results show that the convergence of the algorithm is improved after sample approximation.ii.Research on sparse kernel-based least squares temporal difference algorithm with prioritized Sweeping.Subset of data approximate method can not characterize entirely the features of large scale data with drastic change.Make improvement.For the least squares time difference algorithm based on kernel function approximation,use ALD-based kernel sparse method to reduce the sample redundancy.Utilize Sherman-Morrison equation optimize algorithm and reduce the computational complexity.In addition,the idea of prioritized sweeping is exploited to increase the utilization of the useful samples.The experimental results show that the algorithm can effectively speed up the convergence and improve the convergence precision.iii.Research on kernel function selection based on the sample distribution characteristic.Use the function approximation to solve large-scale or continuous space reinforcement learning tasks.In order to avoid "curse of dimensionality" problem,except the methods of sample approximation and sparsification,we can choose the suitable kernel function to characterize the features of samples.Take the most common kernel functions: gaussian radial basis and polynomial kernel function as examples.Study how to choose suitable kernel function based on sample distribution characteristic.Then use the selected kernel function in least square time difference algorithm to implement efficient approximation and improve convergence.The experimental results show that kernel temporal difference based on sample distribution characteristic discriminant has better performance on convergence.

Keywords/Search Tags:

reinforcement learning, kernel function, temporal difference, sample approximation, sparsification

PDF Full Text Request

Related items

1	Research On Weight Update Method In Temporal Difference Algorithm
2	Research On Reinforcement Learning In Continuous Spaces
3	Study Of Reinforcement Learning Algorithms Based On Value Function Approximation
4	Research On Regularized Least Squares Policy Evaluation Algorithms In Reinforcement Learning
5	Application Of Radial Basic Function Networks And Instance Based Learning In Reinforcement Learning
6	Research On Online Reinforcement Learning Based On Sparse Representation
7	Data-adaptive Kernel Learning And Its Applications
8	Reinforcement Learning-based Optimal Control Methods With Applications To Mobile Robots
9	Theoretical Research On Multi-step Reinforcement Learning Algorithm
10	Research On Value Function Approximation Methods In Reinforcement Learning