Research And Implementation On Hierarchical Reinforcement Learning

Posted on:2024-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:S H Luo

Full Text:PDF

GTID:2568306944459204

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the development of deep reinforcement learning(DRL),it has achieved human expert level performances on more and more complex control tasks.To handle more challenging environment such as sparse reward tasks,hierarchical reinforcement learning adapting multi deep reinforcement learning agents as policies of different hierarchies,which enables the agent make decision with temporal abstraction and state abstraction.Subgoal-based HRL is a widely implemented HRL fashion,which break down a complex task into a set of subgoals and provide a denser intrinsic reward to make credit assignment much easier.In this thesis,our research follows this architecture.Conventional subgoal-based HRL methods generates the intrinsic reward mainly based on the distance between current state and subgoal.However,the intrinsic reward which solely depends on the distance between state and goal failed utilize additional task knowledge.Consequently,they cannot distinguish whether current state changing results from the actions of the agent or environment dynamics,which limits their sample efficiency.If the environment suffers random interference,existing HRL methods may failed learning effective policies because all random interference reflects on the distance between state and subgoals,which results in the intrinsic reward.To tackle this problem,this thesis proposed an Attention Reward(AR)based on dimension-erasure method and the notion of contingency awareness.It encourages the agent focus on the difference results from its actions and decrease the influence of random interference from the environment.To generate the attention reward,this thesis introduce a Contingent Weight to evaluate how much each dimension in current state is infected by the actions,and integrate it into conventional HRL methods to improve their sample efficiency.Our experiment show that our attention reward improved the sample efficiency of current HRL methods and the contingent weight is interpretable.Incrementally,to be compatible with existing transition relabeling methods,this thesis proposed Hierarchical Reinforcement Learning with Attention Reward(HiAR),optimizations have been made to the reward granularity,reward scale,and reward testing process,enabling the HiAR algorithm to further enhance sample efficiency through the use of transition relabeling methods.The sample efficiency of HiAR improved in Pendulum and UR5 Reacher.Finally,this thesis introduces some applications in hierarchical reinforcement learning,including modeling the simulation environment of Intelligence Town and the overall profit optimization with reinforcement learning policies application.

Keywords/Search Tags:

Hierarchical reinforcement learning, Deep Reinforcement learning, intrinsic reward, contingency awareness

PDF Full Text Request

Related items

1	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks
2	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
3	Towards Design Of Intrinsic Rewards For Sparse Reward Problem
4	Research On Sample-efficient Deep Reinforcement Learning Methods
5	Research On Intrinsic Rewards For Reinforcement Learnin
6	Research On Exploration Strategy In Deep Reinforcement Learning
7	Feature Extraction In Deep Reinforcement Learning And Countermeasures For Sparse Reward
8	Research On Manipulator Grasping Method Based On Reinforcement Learning And Meta-learning
9	Research And Implementation Of Sparse Reward Algorithm Based On Reinforcement Learning For Virtual Shooting Scenes
10	Research On Sparse Reward Problem Of Multi-AUVs Cooperative Hunting Based On Deep Reinforcement Learning