Reinforcement learning gets input information through interaction with theenvironment, and then uses the information to improve the policy. In recent years,reinforcement learning has received extensive attention in academic circles, and hasbecome an important branch in the field of machine learning. Traditional reinforcementlearning in large-scale state spaces usually uses parametric function approximation torepresent the value function, but it can not solve the problems of slow convergencespeed and low precision of policy. Nonparametric function approximation is a flexible,fully sample-based function approximation, with high accuracy and fast convergencecharacteristics. It is consistent with the fundamental principle of reinforcement learning.On the basis of the existing algorithm, this paper combines the non-parametricfunction approximation and reinforcement learning algorithm and proposes:(1) According to the low sample-efficiency in reinforcement learning problem, wepropose a Prioritized Sweeping Based Nonparametric LSPI algorithm. It combinesprioritized sweeping with nonparametric function approximation to reduce explorationby model learning;(2) According to the high time complexity of algorithm based on nonparametricfunction approximation, we propose a Sparse Sample-based Gaussian Process PolicyIteration algorithm. It builds models of reinforcement learning problem based onGaussian process and use kernel sparse to reduce the redundancy of sample space.These algorithms improve the accuracy and convergence speed by improvingsample-efficiency and reducing redundant explores. |