Font Size: a A A

Robust Reinforcement Learning With Transfer And Meta Learning

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Y JiangFull Text:PDF
GTID:2518306725993059Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning is designed to solve sequential decision-making problems using machine learning techniques.Compared with supervised learning that only provides predictions given a fixed dataset,reinforcement learning actively interacts with the environment and improves the performance by trial and error.Since most of the problems in the real world can be regarded as sequential decision-making problems,reinforcement learning is seen as one of the most promising ways to realize ”general artificial intelligence”.In recent years,reinforcement learning,especially deep reinforcement learning,has made impressive progress in many fields,including a large number of virtual applications such as Go,Dota,Star Craft and some real-world applications,such as recommendation systems,task scheduling and robot control.However,current reinforcement learning methods are likely to fail in a wider range of real-world scenarios.How to apply reinforcement learning methods in more tasks remains an open problem.The real world is very different from the virtual environment: the real world system is stochastic and constantly changing;the execution of actions in the real world needs to meet strict safety constraints.For example,in automatic driving,once a pedestrian is found in front,the vehicle must stop.In the real world,interacting with the environment is expensive and slow.For example,the cost of a robotic arm may be several hundred thousand yuan,and its execution speed in the physical world is much slower than the simulation speed of a physics engine.Although it is possible to partially transfer a real-world learning task to a learning task in a virtual environment by constructing a simulator,the cost of constructing a high-fidelity simulator is still high,and no matter how to improve the fidelity of the simulator,The gap between the simulator and the real environment will still exist.This article addresses several specific challenges in applying reinforcement learning to real scenarios,and combines the latest developments in the field of meta-learning and transfer learning in the field of machine learning.The first two parts of this article study how to robustly deploy reinforcement learning policies in real environment when there is a simulator that is different from the real environment plus a small number of samples of the real environment.By explicitly modeling the differences between the simulator and the environment,this article analyzes what measures should be taken when the differences mainly exist in the transfer function and the observation model.When the difference mainly exists in the transfer function,we adopt the observation-only generative adversarial imitation learning and the adaptive multi-step inverse dynamics model.A local target state is generated by training a strategy whose state distribution is consistent with expert data in the simulator.When deployed,current state and target state are input into the inverse dynamics model trained on real environment samples to recover an action.When the difference mainly exists in the observation model,we use the inherent sequence properties of MDP to model the domain adaptation problem as a variational inference problem,and use the generative adversarial network to deal with the KL divergence that is difficult to calculate.We also designed a unique residual structure for the long-trajectory RNN network to enhance its training stability.The third part aims at improving the robustness of policies to instable environments and diverse tasks.We use and expand the zero-shot meta-learning algorithm based on environmental context.By improving the context feature extractor of previous work,this paper has achieved a better combination of context feature learning and strategy learning.This enables the learned contextual features to better capture the associations and differences between different tasks and environments.On the other hand,with the introduction of timing constraints during feature learning,this article enables the context feature extractor and strategy to adapt quickly when the environment parameters change.
Keywords/Search Tags:Reinforcement Learning, Meta Learning, Transfer Learning
PDF Full Text Request
Related items