Font Size: a A A

Study On The Generative Adversarial Imitation Learning Based On State Features

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YanFull Text:PDF
GTID:2518306533972989Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Imitation learning is a research direction that has attracted much attention in the field of artificial intelligence to solve the decision-making problems of agents,it aims to make agents imitate the behavior of experts in a given task.Generative adversarial imitation learning is the latest research result of imitation learning,which is suitable for dealing with some complex,large-scale and thorny problems.However,for an agent with a random initial policy,there is usually a big difference between its policy and the expert's potential policy,which makes it take a long time for the agent to find the expert policy.When encountering some unseen states,if we directly use the expert demonstrations to train the mode,it cannot make correct decisions,and its generalization performance will be poor.In addition,the discriminator is optimized by using cross-entropy loss,when the discriminator thinks the agent state-action pair is from demonstrations,it is difficult to continue to guide the generator to be optimized.Since the performance of the model is related to the model structure and the input data of the model,this paper attempts to improve the performance by modifying the discriminator model structure and its input to address the above problems.The specific research content includes the following two parts:(1)Generative adversarial imitation learning based on generating approximate samples with state features.Imitation is a gradual process,and it is difficult for the agent to directly imitate the expert's policy during the training process.On the contrary,if the expert demonstrations is divided into data generated by different levels of expert policy,the agent can gradually learn from a poor policy to a good policy.At the same time,it is instructive for the approximate sample of the expert demonstrations to train the model.For this reason,this paper proposes a method called generative adversarial imitation learning based on generating approximate samples with state features.The main idea is: by gradually increasing the authenticity of expert demonstrations,the expert demonstrations is divided into data generated by different levels of expert policy.Experiments show that this method has a significant effect on the improvement of model performance,and has excellent performance in stability and generalization.(2)Generative adversarial imitation learning based on state feature mapping.The discriminator is used to distinguish whether the state action pair is from an expert demonstrations or generated by the policy of agent,and then the discriminator instructs the policy network to generate a policy close to the expert.However,for some data at the edge of decision-making,because it is judged to come from expert demonstrations,it is difficult to guide the update of the policy network parameters when continuing to use the cross-entropy loss to optimize the discriminator,and the resulting state-action pairs are not perfect.For this reason,this paper proposes a method of generative adversarial imitation learning based on state feature mapping.The main idea is to generate a reconstructed state through state feature mapping,and then use the difference between the reconstructed state and the original state to measure the difference between the agent policy and the expert policy.The distance between them in turn guides the policy network to produce policy close to the expert policy.The implementation results show that this method can make the agent policy quickly converge to the vicinity of the expert policy.
Keywords/Search Tags:generative adversarial imitation learning, approximate sample, state feature, expert policy
PDF Full Text Request
Related items