Font Size: a A A

An Information-theoretic Exploration Approach For Deep Reinforcement Learning

Posted on:2023-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiuFull Text:PDF
GTID:2558307154475014Subject:Engineering
Abstract/Summary:PDF Full Text Request
Recently,Deep Reinforcement Learning(DRL)has made great progress in solving intelligent decision problems,and an efficient exploration strategy is crucial for the performance of DRL algorithms.Many exploration strategies are built upon the optimism in the face of the uncertainty principle for deep reinforcement learning,guiding the agent to explore the regions with high uncertainty.However,without considering the aleatoric uncertainty,existing methods may over-explore the state-action pairs with large randomness and hence are non-robust.In this paper,we explicitly capture the aleatoric uncertainty from a distributional perspective and propose an information-theoretic exploration method named Optimistic Value Distribution Explorer(OVD-Explorer),to avoid the areas with high aleatoric uncertainty during exploration.Following the optimism principle,OVD-Explorer can guide agent to explore the area with high epistemic uncertainty through maximizing the mutual information between optimistic value distribution and policy.More importantly,it can avoid exploring the areas with high aleatoric uncertainty through such optimization process taking into account the value distribution.Furthermore,to make OVDExplorer tractable for continuous RL,this study derives a closed form solution and proposes the scheme for combining OVD-Explorer with any policy-based RL algorithm.Concretely,OVD-Explorer is integrated with distributional SAC in this work,which,to the best of our knowledge,for the first time alleviates the negative impact on exploration caused by aleatoric uncertainty for continuous RL.Empirical evaluations on the commonly used Mujoco benchmark and a novel Grid Chaos task demonstrate that OVD-Explorer can alleviate over-exploration and outperform the state-of-the-art methods.
Keywords/Search Tags:Uncertainty, Exploration strategy, Deep reinforcement learning
PDF Full Text Request
Related items