Font Size: a A A

Reinforcement Learning With Dynamic Uncertainty Awareness

Posted on:2023-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:S ShenFull Text:PDF
GTID:2568306845989599Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning algorithms have achieved a lot of progress and success in recent years.However,the current development and application of reinforcement learning algorithms are mainly limited to virtual fields such as video games,chess and card games.The application of reinforcement learning algorithms in the real world has always been limited by the disadvantages of high sampling cost and unstable decision-making.Modelbased reinforcement learning algorithm is one of the main methods to improve sample efficiency and a key technique for applying reinforcement learning in the real world.However,the performance of model-based reinforcement learning algorithm is affected by the prediction accuracy of the environment model learned by the agent.The prediction error in the environment model will lead to the decline of decision quality,which is the bottleneck of model-based reinforcement learning algorithm.This paper has done two parts of work on the prediction error of the environment model and its influence on the algorithm.The first part of the work builds a more accurate environment model to directly reduce the prediction error,and the second part of the work improves the reinforcement learning algorithm to use the environment model more effectively,which reduces the influence of the prediction error of the environment model on the decision-making agent and improves the sample efficiency.In order to obtain a accurate model of the environment,a proper mathematical model of the entire environment is first required.This paper analyzes two kinds of uncertainty in environment model,epistemic uncertainty and aleatoric uncertainty.Then the two kinds of uncertainties are modeled by probabilistic neural network and ensemble learning methods respectively,so that the environment model has better predictive ability and provides more accurate trajectories for the agent.At the same time,ablation study shows that modeling two types of uncertainty can improve the reinforcement learning algorithm.In general,the longer the trajectory predicted,the larger the prediction error.In order to reduce the influence of the prediction error of the environment model on the decisionmaking agent,it is necessary to adjust the length of the planning trajectory in the planning algorithm.First,we expand the finite horizon planning to infinite horizon planning by introducing the value function,so that the planning length can be indirectly adjusted by the λcoefficient.The value function with uncertainty estimates is then learned using ensemble learning plus a fixed random prior function.Combining the above two points,we propose the MPV(λ)(Model Predictive Control with λ Weights Value Function)algorithm to indirectly adjust the planning horizon,and improve the sample efficiency of the algorithm by finding the balance point between the value function error and the environment model prediction error.Finally,this paper runs the MPV(λ) algorithm in multiple robot control tasks in the Mujoco simulation engine.Compared with the traditional planning algorithm,the cumulative reward of the policy is higher,indicating that the proposed algorithm in this paper has higher sample efficiency.The influence of hyperparamters such as environment model error,planning length,and number of action sequences on the algorithm is also verified by adjusting parameters.
Keywords/Search Tags:model-based reinforcement learning, uncertainty, sample efficiency, planning algorithm, reinforcement learning
PDF Full Text Request
Related items