Font Size: a A A

Research And Implementation On Hierarchical Reinforcement Learning

Posted on:2024-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:S H LuoFull Text:PDF
GTID:2568306944459204Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of deep reinforcement learning(DRL),it has achieved human expert level performances on more and more complex control tasks.To handle more challenging environment such as sparse reward tasks,hierarchical reinforcement learning adapting multi deep reinforcement learning agents as policies of different hierarchies,which enables the agent make decision with temporal abstraction and state abstraction.Subgoal-based HRL is a widely implemented HRL fashion,which break down a complex task into a set of subgoals and provide a denser intrinsic reward to make credit assignment much easier.In this thesis,our research follows this architecture.Conventional subgoal-based HRL methods generates the intrinsic reward mainly based on the distance between current state and subgoal.However,the intrinsic reward which solely depends on the distance between state and goal failed utilize additional task knowledge.Consequently,they cannot distinguish whether current state changing results from the actions of the agent or environment dynamics,which limits their sample efficiency.If the environment suffers random interference,existing HRL methods may failed learning effective policies because all random interference reflects on the distance between state and subgoals,which results in the intrinsic reward.To tackle this problem,this thesis proposed an Attention Reward(AR)based on dimension-erasure method and the notion of contingency awareness.It encourages the agent focus on the difference results from its actions and decrease the influence of random interference from the environment.To generate the attention reward,this thesis introduce a Contingent Weight to evaluate how much each dimension in current state is infected by the actions,and integrate it into conventional HRL methods to improve their sample efficiency.Our experiment show that our attention reward improved the sample efficiency of current HRL methods and the contingent weight is interpretable.Incrementally,to be compatible with existing transition relabeling methods,this thesis proposed Hierarchical Reinforcement Learning with Attention Reward(HiAR),optimizations have been made to the reward granularity,reward scale,and reward testing process,enabling the HiAR algorithm to further enhance sample efficiency through the use of transition relabeling methods.The sample efficiency of HiAR improved in Pendulum and UR5 Reacher.Finally,this thesis introduces some applications in hierarchical reinforcement learning,including modeling the simulation environment of Intelligence Town and the overall profit optimization with reinforcement learning policies application.
Keywords/Search Tags:Hierarchical reinforcement learning, Deep Reinforcement learning, intrinsic reward, contingency awareness
PDF Full Text Request
Related items