Font Size: a A A

Research On Temporal Difference Algorithm Based On Kernel Function Approximation

Posted on:2018-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:C J SunFull Text:PDF
GTID:2348330542965192Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is an important branch of machine learning.It can interact with the environment continuously to get feedback signal and implement the optimization of the policy.At present,reinforcement learning knowledge is more widely applied to job scheduling,path planning,online learning control,game and other practical areas.However,at this stage using reinforcement learning knowledge to solve the practical large-scale or continuous space tasks still faces some difficulties:i.the problem of the balance between exploration and exploitation;ii.the "dimension disaster" problem;iii.the time reliability distribution problem etc.Temporal difference algorithm is an effective method to solve the problem of time reliability distribution.The general method of solving the large-scale or continuous space reinforcement learning tasks is function approximation.Kernel function approximation is a typical nonparametric approximation,which has better generalization performance than the parametric approximation.However,its computational complexity increases with the increase of the samples.This paper researches on the kernel-based temporal difference algorithm and discusses the performance of kernel-based reinforcement learning algorithm.The main research work is as follows:i.Research on kernel least squares temporal difference algorithm based on approximate samples.In order to make the balance of exploration and exploit in large-scale or continuous space practical reinforcement learning tasks and avoid the "curse of dimensionality" problem.Study least squares time difference algorithm based on kernel function approximation.Use subset of data approximate method based on the criterion of maximum variance to deal with the samples.Reach the goal of reducing the samples and improving the efficiency of approximation.The experimental results show that the convergence of the algorithm is improved after sample approximation.ii.Research on sparse kernel-based least squares temporal difference algorithm with prioritized Sweeping.Subset of data approximate method can not characterize entirely the features of large scale data with drastic change.Make improvement.For the least squares time difference algorithm based on kernel function approximation,use ALD-based kernel sparse method to reduce the sample redundancy.Utilize Sherman-Morrison equation optimize algorithm and reduce the computational complexity.In addition,the idea of prioritized sweeping is exploited to increase the utilization of the useful samples.The experimental results show that the algorithm can effectively speed up the convergence and improve the convergence precision.iii.Research on kernel function selection based on the sample distribution characteristic.Use the function approximation to solve large-scale or continuous space reinforcement learning tasks.In order to avoid "curse of dimensionality" problem,except the methods of sample approximation and sparsification,we can choose the suitable kernel function to characterize the features of samples.Take the most common kernel functions: gaussian radial basis and polynomial kernel function as examples.Study how to choose suitable kernel function based on sample distribution characteristic.Then use the selected kernel function in least square time difference algorithm to implement efficient approximation and improve convergence.The experimental results show that kernel temporal difference based on sample distribution characteristic discriminant has better performance on convergence.
Keywords/Search Tags:reinforcement learning, kernel function, temporal difference, sample approximation, sparsification
PDF Full Text Request
Related items