Font Size: a A A

Research On Data Efficient Third-person Imitation Learning Methods

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:C JiangFull Text:PDF
GTID:2428330614459404Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning methods have achieved great success in many fields.In reinforcement learning,the agent needs to interact with the environment constantly and update policy according to the evaluation feedback given by the environment.However,when the agent is in an environment where the feedback is sparse or even without feedback,the reinforcement learning methods are difficult to work.Imitation learning is a method that does not rely on the feedback given by the environment,allowing the agent to learn a policy by observing expert demonstrations.This paper focuses on the problem that traditional imitation learning methods have a high demand for expert demonstrations,and proposes a data efficient imitation learning method that can learn from the third-person expert demonstrations.The main research contents can be summarized as the following three parts:(1)Traditional imitation learning methods usually have very high requirements for the expert demonstrations,for example,samples should be low-dimensional feature data,and samples should contain the information of action and the environment of agent and expert should be same.These requirements seriously limit the application of imitation learning methods in reality.In real life,the expert demonstrations usually exist in the form of videos,and there are some differences between the environment of the expert and the agent.These more easily obtained demonstrations are called the third-person demonstrations.However,due to the difference between the third-person demonstrations and samples generated by the agent,and the lack of direct correspondence between them,it is difficult to apply the thirdperson demonstrations in the imitation learning methods.In response to this problem,this paper proposed an imitation learning method,GAIf O-ID,which can learn from third-person demonstrations based on the generative adversarial imitation learning method and image difference mechanism.This paper analyzes the data efficiency of the algorithm,and conducts experiments on multiple simulation environments to prove the superiority of the algorithm.(2)For generative adversarial imitation learning method,whether the training process of the game between discriminator and policy is balanced seriously affects the performance of the final learned policy.In the third-person imitation learning task,there are obvious differences in domain between the expert demonstrations and the generated samples,which can easily lead to discriminator too strong,and the policy is difficult to obtain effective feedback to update in the game process.In response to this problem,this paper introduces variational discriminator bottleneck and improves it,and proposes GAIf O-ID-VDB.This method weakens the performance of the discriminator by restricting the discrimination of the generated samples,and prompts the discriminator to provide more accurate feedback information for the policy.(3)One of the main reasons why the third person demonstrations are difficult to be applied to imitation learning is the lack of direct correspondence with the samples generated by the agent.In response to this problem,this paper introduces an additional image translation module on the basis of GAIf O-ID-VDB,which can translate expert demonstrations from the third-person perspective to the first-person perspective,so as to eliminate the domain difference between the expert demonstrations and the generated samples,and enable the agent to better learn the expert policy from the third-person demonstrations.
Keywords/Search Tags:Imitation Learning, Third-Person Demonstrations, Image Difference, Variational Discriminator, Image Translation
PDF Full Text Request
Related items