Font Size: a A A

Research On Data-efficient Reinforcement Learning Based On Local Gaussian Process Regression

Posted on:2022-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:2518306572960649Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In the field of control and robotics research,we often face challenging decisionmaking problems: data is scarce or the process is complex and partly unknown.In these cases,it is of great significance to design an algorithm that can learn from the data and use it for decision-making.Reinforcement learning(RL)is a general-purpose calculation method based on experience and goal orientation,which can be used for decision-making problems under uncertainty.However,in the absence of specific engineering knowledge,RL usually needs to interact with the environment many times,that is,lack of interaction efficiency.Therefore,for the control decision problem of scarce data(less interaction),this paper studies model-based reinforcement learning,and proposes a data-efficient reinforcement learning algorithm based on local Gaussian process regression.This paper uses Gaussian process regression to learn the probabilistic dynamics model,and quantifies the knowledge learned from the data in the form of probability distribution.In order to solve the high computational cost problem that the complexity of the Gaussian process regression algorithm is cubic with the number of samples,a local Gaussian process regression is proposed.The essential idea is "divide and conquer".First,use the K-means clustering algorithm to divide the overall sample into local small samples,and then learn the local Gaussian model in the small samples,and finally obtain the final dynamic model through the weighted average local model.The goal of reinforcement learning is to maximize the mathematical expectation of long-term rewards,using probabilistic dynamics models to predict future rewards.This paper studies and derives the long-term state distribution.By analyzing and comparing various state-evolution prediction methods,the final choice is based on the Monte Carlo sampling method: it does not need to explicitly calculate the value of reward of each step,but divides the expected reward of the entire process.It is divided into the actual reward function and noise,and assumes that the mathematical expectation of noise is 0.Thus,the problem we study is transformed into a function optimization problem with noise.This paper studies the estimation of distribution algorithms to solve the optimal strategy search problem.In this article,the Covariance Matrix Adaptive Evolution Strategy(CMA-ES)is adopted.This algorithm is an excellent black-box optimizer,especially for solving noisy optimization problems.The algorithm introduces two evolutionary paths to make up for the scarcity of populations: one is to consider all generations of populations,and the second is to consider the evolutionary direction of individuals(i.e.the evolutionary path).Combining the two evolutionary paths can realize strategy optimization.At the end of this paper,a second-order inverted pendulum is used to verify the effectiveness of the algorithm and data-efficiency.The second-order inverted pendulum is a nonlinear,highly coupled under-actuated dynamic system.We use the Gym and MuJoCo semi-physical simulation platforms.By comparing with the classic data-efficient algorithm PILCO,it is verified that the local Gaussian process regression can effectively solve the high computational cost problem of the Gaussian process regression.Besides,it is verified that CMA-ES can search for better results than the gradient algorithm.
Keywords/Search Tags:Local Gaussian Process Regression, Data-Efficient, CMA-ES, Mechanical System
PDF Full Text Request
Related items