Font Size: a A A

Research Of Reinforcement Learning In Parameterized Action Space For Sparse Reward

Posted on:2024-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:J K SongFull Text:PDF
GTID:2568306932462084Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,deep reinforcement learning has achieved great success in fields such as games and robot control,and has attracted widespread attention.Common deep reinforcement learning algorithms usually can only deal with discrete or continuous action spaces,but the parameterized action space is closer to real problems.In the parameterized action space,discrete actions with continuous parameters can be used for more precise control.In addition,many real-world tasks often face the sparse reward problem,which brings great challenges to deep reinforcement learning,especially in complex parameterized action spaces.This thesis first proposes an improved algorithm based on the hybrid proximal policy optimization algorithm to deal with reinforcement learning problems in parameterized action spaces.It separates continuous policies to reduce the interaction between continuous policies of different discrete actions,and introduces action masks to improve training speed and flexibly control action space.But it is difficult to deal with complex scenarios only with reinforcement learning,especially the sparse reward problem.For the sparse reward problem in the parameterized action space,this thesis proposes two methods.Curriculum learning is a method to accelerate deep reinforcement learning by splitting the task into a sequence of progressively more difficult sub-tasks.Based on the theory of curriculum learning,this thesis proposes a new curriculum learning method with macro actions for parameterized action spaces.Different from previous curriculum learning methods,this method focuses on the action space,reduces the difficulty of initial training by introducing several pre-designed macro actions.Then it designs a task sequence in which the action space gradually decreases,and gradually removes these macro actions by curriculum learning and restores it to the original actions space,which reduces the impact of imperfect design of macro actions on the policy.After that,this thesis generalizes the macro actions in the above-mentioned curriculum learning method into source policies,and proposes an adaptive multi-source policy transfer method.It regards the source policies as additional discrete actions,make the agent automatically selects the appropriate source policy by reinforcement learning,and transfers the information of the source policy to the target policy by policy distillation.It makes full use of the source policy and avoids negative transfer of policies in some way.We can decompose the target task into multiple simple sub-tasks,and learn the sub-policies in the sub-tasks as source policies,and learn the policy in the target task by this method.In this thesis,experiments on the Robocup2D environment show the effectiveness of the two methods.In both single-agent and multi-agent scenarios of Half Field Offense,both methods can train effective offensive policies with sparse rewards and achieve high scoring rates.
Keywords/Search Tags:Deep Reinforcement Learning, Parameterized Action Space, Sparse Reward, Curriculum Learning, Macro Actions, Transfer Learning
PDF Full Text Request
Related items