Research On Goal-oriented Model-based Reinforcement Learning

Posted on:2022-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:T Qiu

Full Text:PDF

GTID:2518306560991879

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The ability to solve sequential decision-making problems is one of the core elements of artificial intelligence,and reinforcement learning is a method for solving these sequential decision-making problems.Model-free reinforcement learning algorithms have achieved significant results in many applications,but they need a lot of interaction with the environment to obtain enough data for strategy training.Model-based reinforcement learning algorithms learn the dynamic model of the environment by using low reward data that is difficult to use by model-free reinforcement learning algorithms,so that the strategy uses the data simulated by the model,thereby greatly reducing the number of interactions required with the real environment.Model-based reinforcement learning is developed from the field of optimal control.It was originally used to solve sequential decision problems in the case of completely known models.Optimal control-related algorithms usually require no or only a small amount of interaction to obtain the optimal strategy.The use of models can also improve the adaptability and expansion capabilities of the algorithm in more scenarios.The predictive capabilities of the model also reflect the intelligent mode of prediction and planning in human intelligence.However,in a complex environment,the learned model cannot avoid the problem of higher prediction errors,which results in the performance of the algorithm being worse than the corresponding model-free method.This paper analyzes the impact of model errors on the performance of reinforcement learning algorithms,and proposes a method to optimize the model itself,called the goaloriented model.This method uses the algorithm instantiated by the Dyna framework to achieve better results than the state-of-the-art model-free and model-based reinforcement learning algorithms in several reinforcement learning benchmark test environments.The goal-oriented model uses the state value information provided by the model-free algorithm to calculate the temporal difference error of the state,which is used to indicate the importance of the scene.The model training uses the priority experience replay training method based on the temporal difference error.Combined with the basic Dyna algorithm framework,this paper designs a reinforcement learning algorithm based on a goal-oriented model.The optimized training model is used to generate simulated interaction experience with the real environment,and the model-free algorithm uses real experience and simulated experience to train together to reduce the need for interaction in the real environment.In this paper,the goal-oriented model method is experimented and analyzed on a series of MuJoCo control benchmark tasks.Experiments show that by adjusting the training method of the model,our method can reduce the prediction error of using the model and maintain a stable and high performance in long-term prediction.Our method can significantly improve the sampling efficiency of model-free reinforcement learning algorithms.Our method can easily be extended to existing cutting-edge model-free and model-based reinforcement learning algorithms that explicitly estimate state value or state action value.

Keywords/Search Tags:

Model-based Reinforcement Learning, Temporal Difference Error, Prioritized Experience Replay

PDF Full Text Request

Related items

1	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
2	Research On Experience Replay In Deep Reinforcement Learning
3	Research On Experience Replay Method For Deep Reinforcement Learning
4	Research And Implementation Of Recommendation Algorithm Based On Deep Reinforcement Learning
5	Research On Personalized Recommendation Methods Based On Deep Learning
6	Deep Reinforcement Learning With Experience Replay
7	Research On Motion Planning In Dynamic Environment Based On Deep Reinforcement Learning
8	Research Of Multi-agent Cooperation Based On Deep Reinforcement Learning
9	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
10	Research On Optimization Method Of Deep Reinforcement Learning Experience Replay