Optimization For Generative Modeling And Its Applications In Imitation Learning

Posted on:2021-03-27

Degree:Master

Type:Thesis

Country:China

Candidate:K F Zhang

Full Text:PDF

GTID:2428330647950759

Subject:Computer technology

Abstract/Summary:

The application of generating model is very extensive.It can be used to model different data,such as image,text,sound,etc.Among them,the deep generation model is to use the ability of deep neural network to approximate any function to model a complex distribution.As an important part of machine learning,reinforcement learning is becoming more and more popular in robotic systems,machine translation,recommendation systems and so on.Traditional reinforcement learning methods can learn from the interactions with the environment.However,most sequential decision problems cannot give out an immediate reward signal that the algorithm may need.This becomes the bottleneck in applying reinforcement learning algorithms into more complex decision problems.The procedure for inverse reinforcement learning is to design an optimization model to recover the reward function in Markov decision process.So far,the algorithm in imitation learning which combines inverse reinforcement learning and direct reinforcement learning algorithms have made a great progress,includes: Stanford autonomous helicopter,HAVC control,self-car driving and so on.Owing to that imitation learning can help to make decisions by the expert demonstrations,imitation learning is becoming more and more popular.In this paper,we first propose a new generation countermeasure neural network SSGAN.It attributes the failure of the initial manifold to mode collapse.SSGAN adds a supervisory signal to GANs to keep the generated manifold close to the real data manifold during initialization.The experimental results show that this method is superior to other latest GANs training methods in both visual quality and pattern capture.This paper proposes a new imitation learning algorithm of unknown area perception,which only learns the distribution of transformation in expert data,not compared with expert data.Actors will better capture expert behavior from expert demonstrations.We extend GAIL to UAIL by adding an auto-encoder to predict the similarity between sampling state and expert state.Therefore,a new criterion can be used to train the discriminator.In this way,strategies can better capture the behavior of experts and reduce the impact of the environment.Experiments show that our new method,can surpass GAIL in several Atari 2600 games and multiple Mu Jo Co environments.At the same time,this paper introduces the application of imitation learning in JD e-commerce platform.In order to overcome the high physical cost of commodity search RL training in jd.com,we use GAN-SD and MAIL to build a virtual JD simulator based on historical data.The empirical results show that the model can truly reflect the characteristics of the real environment.Then in the virtual JD,using the proposed strategy to train a better engine strategy,the results show that the strategy has better performance than the traditional supervised learning method.

Keywords/Search Tags:

Machine learning, reinforcement learning, generative modeling, imitation learning, inverse reinforcement learning

Related items

1	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
2	Research On Decision Distribution Modeling In Reinforcement Learning
3	Supervised Reinforcement Learning:methods And Applications
4	Research On Control Algorithm Of Bicycle Robot Based On Inverse Reinforcement Learning
5	Research On Uncertainty-weighted Offline Reinforcement Learning
6	Research On Machine Learning Algorithms Based On Planning Network Model
7	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation
8	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
9	End-To-End Active Tracking System Via Deep Reinforcement Learning
10	Research And Implementation Of Deep Reinforcement Learning Algorithm Based On Offline And Online Mixed Strategies