Font Size: a A A

The Decomposition And Reconstruction Of Complex Environment In Reinforcement Learning

Posted on:2022-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2518306725493254Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning is a machine learning research field,which focuses on interacting with the environment to learn an optimal policy.Compared to supervised learning with a given dataset,reinforcement learning masters a decision ability in a self-exploring manner.RL develops fast in recent years,showing its great potential in solving decision problems.But currently,RL algorithms contain many weaknesses and problems,a fatal one is on the complexity of the environment.The efficiency and stability of RL training would drop significantly in a high dimension of target tasks or complex transition dynamics.One direction to solve this problem is to do environment-specific adjustment in general RL algorithms,but it would sacrifice the generalization.Our idea is to decompose and reconstruct the environment itself.If we could decompose the environment to a lower level,the current RL algorithms could be more compatible.So aiming at two complex RL environments,this paper proposed two concrete methods of decomposition and reconstruction.In detail:1.For complex many-goal environments,the existing methods always require a manual decomposition,which leads to poor generalization.We leveraged a computer vision method for weakly supervised segmentation and proposed an automatic reward-guided task decomposition method.This method takes reward as a global label of ob-servation and uses it to train a classifier to visually localize the rewards,then construct the sub-tasks automatically.Based on such a decomposition,we then introduce a simple but effective hierarchical reinforcement learning structure.This method is examined in three different many-goal environments and justifies its perfect decomposition and contribution to RL performance.2.For complex human-involved environments,the environment cannot be built simply by rules for the existence of humans.To build a monitor from data,we adopted a generative adversarial imitation learning method,and proposed a new RL training framework,which firstly decomposes the environment in modules,then apply multi-agent generative adversarial imitation learning to train the simulators.The offline and online experiments show the decomposed environment could generate a useful policy in the real-world environment with only a simple RL algorithm.
Keywords/Search Tags:Machine Learning, Reinforcement Learning, Hierarchical Reinforcement Learning, Task Decomposition
PDF Full Text Request
Related items