The Decomposition And Reconstruction Of Complex Environment In Reinforcement Learning

Posted on:2022-10-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xu

Full Text:PDF

GTID:2518306725493254

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Reinforcement Learning is a machine learning research field,which focuses on interacting with the environment to learn an optimal policy.Compared to supervised learning with a given dataset,reinforcement learning masters a decision ability in a self-exploring manner.RL develops fast in recent years,showing its great potential in solving decision problems.But currently,RL algorithms contain many weaknesses and problems,a fatal one is on the complexity of the environment.The efficiency and stability of RL training would drop significantly in a high dimension of target tasks or complex transition dynamics.One direction to solve this problem is to do environment-specific adjustment in general RL algorithms,but it would sacrifice the generalization.Our idea is to decompose and reconstruct the environment itself.If we could decompose the environment to a lower level,the current RL algorithms could be more compatible.So aiming at two complex RL environments,this paper proposed two concrete methods of decomposition and reconstruction.In detail:1.For complex many-goal environments,the existing methods always require a manual decomposition,which leads to poor generalization.We leveraged a computer vision method for weakly supervised segmentation and proposed an automatic reward-guided task decomposition method.This method takes reward as a global label of ob-servation and uses it to train a classifier to visually localize the rewards,then construct the sub-tasks automatically.Based on such a decomposition,we then introduce a simple but effective hierarchical reinforcement learning structure.This method is examined in three different many-goal environments and justifies its perfect decomposition and contribution to RL performance.2.For complex human-involved environments,the environment cannot be built simply by rules for the existence of humans.To build a monitor from data,we adopted a generative adversarial imitation learning method,and proposed a new RL training framework,which firstly decomposes the environment in modules,then apply multi-agent generative adversarial imitation learning to train the simulators.The offline and online experiments show the decomposed environment could generate a useful policy in the real-world environment with only a simple RL algorithm.

Keywords/Search Tags:

Machine Learning, Reinforcement Learning, Hierarchical Reinforcement Learning, Task Decomposition

PDF Full Text Request

Related items

1	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
2	Hierarchical Reinforcement Learning
3	Reinforcement Learning Based On Spectral Graph Theory
4	Supervised Reinforcement Learning:methods And Applications
5	Research On Hierarchical Reinforcement Learning Based On Action Space Partitioning
6	Continuous Time Hierarchical Reinforcement Learning Algorithm
7	Autonomous Mission Decomposition Based On Hierarchical Reinforcement Learning
8	Research On Control Of Robotic Manipulator Based On Reinforcement Learning
9	Research Of Reinforcement Learning Based On Clustering Analysis
10	Implementation Of Task Structure Utilization In Four Machine Learning Tasks