Reinforcement Learning With Dynamic Uncertainty Awareness

Posted on:2023-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:S Shen

Full Text:PDF

GTID:2568306845989599

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning algorithms have achieved a lot of progress and success in recent years.However,the current development and application of reinforcement learning algorithms are mainly limited to virtual fields such as video games,chess and card games.The application of reinforcement learning algorithms in the real world has always been limited by the disadvantages of high sampling cost and unstable decision-making.Modelbased reinforcement learning algorithm is one of the main methods to improve sample efficiency and a key technique for applying reinforcement learning in the real world.However,the performance of model-based reinforcement learning algorithm is affected by the prediction accuracy of the environment model learned by the agent.The prediction error in the environment model will lead to the decline of decision quality,which is the bottleneck of model-based reinforcement learning algorithm.This paper has done two parts of work on the prediction error of the environment model and its influence on the algorithm.The first part of the work builds a more accurate environment model to directly reduce the prediction error,and the second part of the work improves the reinforcement learning algorithm to use the environment model more effectively,which reduces the influence of the prediction error of the environment model on the decision-making agent and improves the sample efficiency.In order to obtain a accurate model of the environment,a proper mathematical model of the entire environment is first required.This paper analyzes two kinds of uncertainty in environment model,epistemic uncertainty and aleatoric uncertainty.Then the two kinds of uncertainties are modeled by probabilistic neural network and ensemble learning methods respectively,so that the environment model has better predictive ability and provides more accurate trajectories for the agent.At the same time,ablation study shows that modeling two types of uncertainty can improve the reinforcement learning algorithm.In general,the longer the trajectory predicted,the larger the prediction error.In order to reduce the influence of the prediction error of the environment model on the decisionmaking agent,it is necessary to adjust the length of the planning trajectory in the planning algorithm.First,we expand the finite horizon planning to infinite horizon planning by introducing the value function,so that the planning length can be indirectly adjusted by the λcoefficient.The value function with uncertainty estimates is then learned using ensemble learning plus a fixed random prior function.Combining the above two points,we propose the MPV(λ)(Model Predictive Control with λ Weights Value Function)algorithm to indirectly adjust the planning horizon,and improve the sample efficiency of the algorithm by finding the balance point between the value function error and the environment model prediction error.Finally,this paper runs the MPV(λ) algorithm in multiple robot control tasks in the Mujoco simulation engine.Compared with the traditional planning algorithm,the cumulative reward of the policy is higher,indicating that the proposed algorithm in this paper has higher sample efficiency.The influence of hyperparamters such as environment model error,planning length,and number of action sequences on the algorithm is also verified by adjusting parameters.

Keywords/Search Tags:

model-based reinforcement learning, uncertainty, sample efficiency, planning algorithm, reinforcement learning

PDF Full Text Request

Related items

1	Research On Sample-efficient Deep Reinforcement Learning Methods
2	Research On Active Learning Algorithm Based On Reinforcement Learning
3	Towards Sample-efficient Deep Reinforcement Learning
4	Research On Sample Generation And Selection Methods For Deep Reinforcement Learning
5	Sample Efficiency Improvement Method Of Deep Reinforcement Learning And Its Application In Video Bitrate Control
6	Sample Efficiency In Reinforcement Learning
7	Research On Sample Efficiency Optimization Methods For Multi-Agent Reinforcement Learning
8	Research On Meta-Learning Methods Towards High Sample-Efficiency Reinforcement Learning
9	Research On Machine Learning Algorithms Based On Planning Network Model
10	Research On Reinforcement Learning Based On Hidden Space Modeling