Font Size: a A A

Research On Goal-oriented Model-based Reinforcement Learning

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:T QiuFull Text:PDF
GTID:2518306560991879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The ability to solve sequential decision-making problems is one of the core elements of artificial intelligence,and reinforcement learning is a method for solving these sequential decision-making problems.Model-free reinforcement learning algorithms have achieved significant results in many applications,but they need a lot of interaction with the environment to obtain enough data for strategy training.Model-based reinforcement learning algorithms learn the dynamic model of the environment by using low reward data that is difficult to use by model-free reinforcement learning algorithms,so that the strategy uses the data simulated by the model,thereby greatly reducing the number of interactions required with the real environment.Model-based reinforcement learning is developed from the field of optimal control.It was originally used to solve sequential decision problems in the case of completely known models.Optimal control-related algorithms usually require no or only a small amount of interaction to obtain the optimal strategy.The use of models can also improve the adaptability and expansion capabilities of the algorithm in more scenarios.The predictive capabilities of the model also reflect the intelligent mode of prediction and planning in human intelligence.However,in a complex environment,the learned model cannot avoid the problem of higher prediction errors,which results in the performance of the algorithm being worse than the corresponding model-free method.This paper analyzes the impact of model errors on the performance of reinforcement learning algorithms,and proposes a method to optimize the model itself,called the goaloriented model.This method uses the algorithm instantiated by the Dyna framework to achieve better results than the state-of-the-art model-free and model-based reinforcement learning algorithms in several reinforcement learning benchmark test environments.The goal-oriented model uses the state value information provided by the model-free algorithm to calculate the temporal difference error of the state,which is used to indicate the importance of the scene.The model training uses the priority experience replay training method based on the temporal difference error.Combined with the basic Dyna algorithm framework,this paper designs a reinforcement learning algorithm based on a goal-oriented model.The optimized training model is used to generate simulated interaction experience with the real environment,and the model-free algorithm uses real experience and simulated experience to train together to reduce the need for interaction in the real environment.In this paper,the goal-oriented model method is experimented and analyzed on a series of MuJoCo control benchmark tasks.Experiments show that by adjusting the training method of the model,our method can reduce the prediction error of using the model and maintain a stable and high performance in long-term prediction.Our method can significantly improve the sampling efficiency of model-free reinforcement learning algorithms.Our method can easily be extended to existing cutting-edge model-free and model-based reinforcement learning algorithms that explicitly estimate state value or state action value.
Keywords/Search Tags:Model-based Reinforcement Learning, Temporal Difference Error, Prioritized Experience Replay
PDF Full Text Request
Related items