Research Of Reinforcement Learning In Parameterized Action Space For Sparse Reward

Posted on:2024-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:J K Song

Full Text:PDF

GTID:2568306932462084

Subject:Computer application technology

Abstract/Summary:

In recent years,deep reinforcement learning has achieved great success in fields such as games and robot control,and has attracted widespread attention.Common deep reinforcement learning algorithms usually can only deal with discrete or continuous action spaces,but the parameterized action space is closer to real problems.In the parameterized action space,discrete actions with continuous parameters can be used for more precise control.In addition,many real-world tasks often face the sparse reward problem,which brings great challenges to deep reinforcement learning,especially in complex parameterized action spaces.This thesis first proposes an improved algorithm based on the hybrid proximal policy optimization algorithm to deal with reinforcement learning problems in parameterized action spaces.It separates continuous policies to reduce the interaction between continuous policies of different discrete actions,and introduces action masks to improve training speed and flexibly control action space.But it is difficult to deal with complex scenarios only with reinforcement learning,especially the sparse reward problem.For the sparse reward problem in the parameterized action space,this thesis proposes two methods.Curriculum learning is a method to accelerate deep reinforcement learning by splitting the task into a sequence of progressively more difficult sub-tasks.Based on the theory of curriculum learning,this thesis proposes a new curriculum learning method with macro actions for parameterized action spaces.Different from previous curriculum learning methods,this method focuses on the action space,reduces the difficulty of initial training by introducing several pre-designed macro actions.Then it designs a task sequence in which the action space gradually decreases,and gradually removes these macro actions by curriculum learning and restores it to the original actions space,which reduces the impact of imperfect design of macro actions on the policy.After that,this thesis generalizes the macro actions in the above-mentioned curriculum learning method into source policies,and proposes an adaptive multi-source policy transfer method.It regards the source policies as additional discrete actions,make the agent automatically selects the appropriate source policy by reinforcement learning,and transfers the information of the source policy to the target policy by policy distillation.It makes full use of the source policy and avoids negative transfer of policies in some way.We can decompose the target task into multiple simple sub-tasks,and learn the sub-policies in the sub-tasks as source policies,and learn the policy in the target task by this method.In this thesis,experiments on the Robocup2D environment show the effectiveness of the two methods.In both single-agent and multi-agent scenarios of Half Field Offense,both methods can train effective offensive policies with sparse rewards and achieve high scoring rates.

Keywords/Search Tags:

Deep Reinforcement Learning, Parameterized Action Space, Sparse Reward, Curriculum Learning, Macro Actions, Transfer Learning

Related items

1	Feature Extraction In Deep Reinforcement Learning And Countermeasures For Sparse Reward
2	Research On Sparse Reward Problem Of Multi-AUVs Cooperative Hunting Based On Deep Reinforcement Learning
3	Research And Implementation Of Sparse Reward Algorithm Based On Reinforcement Learning For Virtual Shooting Scenes
4	Research On Key Technologies Of Reinforcement Learning Algorithms For Continuous Action Space
5	Research On Sparse Reward Based On Reinforcement Learning
6	Research On Active SLAM Algorithm Based On Meta-reinforcement Learning
7	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
8	Research And Application Of Dialog Policy Module Based On Multi-Agent Reinforcement Learning
9	Research On Microgrid Energy Management Strategy Based On Deep Reinforcement Learning
10	Research On Intelligent Edge Computation Offloading Based On Parameterized Hybrid Action Space