Research On Data-efficient Reinforcement Learning Based On Local Gaussian Process Regression

Posted on:2022-05-31

Degree:Master

Type:Thesis

Country:China

Candidate:G Chen

Full Text:PDF

GTID:2518306572960649

Subject:Control Engineering

Abstract/Summary:

PDF Full Text Request

In the field of control and robotics research,we often face challenging decisionmaking problems: data is scarce or the process is complex and partly unknown.In these cases,it is of great significance to design an algorithm that can learn from the data and use it for decision-making.Reinforcement learning(RL)is a general-purpose calculation method based on experience and goal orientation,which can be used for decision-making problems under uncertainty.However,in the absence of specific engineering knowledge,RL usually needs to interact with the environment many times,that is,lack of interaction efficiency.Therefore,for the control decision problem of scarce data(less interaction),this paper studies model-based reinforcement learning,and proposes a data-efficient reinforcement learning algorithm based on local Gaussian process regression.This paper uses Gaussian process regression to learn the probabilistic dynamics model,and quantifies the knowledge learned from the data in the form of probability distribution.In order to solve the high computational cost problem that the complexity of the Gaussian process regression algorithm is cubic with the number of samples,a local Gaussian process regression is proposed.The essential idea is "divide and conquer".First,use the K-means clustering algorithm to divide the overall sample into local small samples,and then learn the local Gaussian model in the small samples,and finally obtain the final dynamic model through the weighted average local model.The goal of reinforcement learning is to maximize the mathematical expectation of long-term rewards,using probabilistic dynamics models to predict future rewards.This paper studies and derives the long-term state distribution.By analyzing and comparing various state-evolution prediction methods,the final choice is based on the Monte Carlo sampling method: it does not need to explicitly calculate the value of reward of each step,but divides the expected reward of the entire process.It is divided into the actual reward function and noise,and assumes that the mathematical expectation of noise is 0.Thus,the problem we study is transformed into a function optimization problem with noise.This paper studies the estimation of distribution algorithms to solve the optimal strategy search problem.In this article,the Covariance Matrix Adaptive Evolution Strategy(CMA-ES)is adopted.This algorithm is an excellent black-box optimizer,especially for solving noisy optimization problems.The algorithm introduces two evolutionary paths to make up for the scarcity of populations: one is to consider all generations of populations,and the second is to consider the evolutionary direction of individuals(i.e.the evolutionary path).Combining the two evolutionary paths can realize strategy optimization.At the end of this paper,a second-order inverted pendulum is used to verify the effectiveness of the algorithm and data-efficiency.The second-order inverted pendulum is a nonlinear,highly coupled under-actuated dynamic system.We use the Gym and MuJoCo semi-physical simulation platforms.By comparing with the classic data-efficient algorithm PILCO,it is verified that the local Gaussian process regression can effectively solve the high computational cost problem of the Gaussian process regression.Besides,it is verified that CMA-ES can search for better results than the gradient algorithm.

Keywords/Search Tags:

Local Gaussian Process Regression, Data-Efficient, CMA-ES, Mechanical System

PDF Full Text Request

Related items

1	Research On Efficient Approximation Methods For Gaussian Process Regression
2	Single Image Super-resolution Algorithms Based On Gaussian Process Regression
3	Research On Gaussian Process For Regression And Prediction
4	Data-driven Trajectory Tracking Of Robotic Arm With Event-triggered Model Updating
5	Research On The Application Of Gaussian Process Regression In The Prediction Based On Image And Video Data
6	A Study On Dehazing With Gaussian Process Regression And Training Example Searching
7	Research On Multiple Extended Targets Tracking Based On Gaussian Process Regression
8	Study On The Blind Equalization Algorithm Based On Gaussian Process Regression
9	Research On Bayesian Filtering Diagnosis Methods Based On Gaussian Process Regression Model
10	Sequential Training Of Semi-supervised Classification Based On Sparse Gaussian Process Regression