Research And Implementation Of Imitation Learning For Complex Tasks In Large-scale Environments

Posted on:2022-11-18

Degree:Master

Type:Thesis

Country:China

Candidate:Q H Liu

Full Text:PDF

GTID:2518306773985179

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The practical applications of AI technology are maturing due to the increase in computing power brought by advances in microelectronics and the continuous development of machine learning algorithm theory.However,traditional machine learning methods based on supervised learning.It is difficult to get an agent can perform continuous decision-making tasks in real world environments such as autonomous driving and autonomous robot control independently.Reinforcement learning aims to solve continuous decision-making problems in the framework of Markov Decision Process to get a reliable agent,but the application of reinforcement learning in largescale environments and complex tasks is difficult due to the limitations of the reward function.Imitation learning uses expert demonstration inplace reward function,it is expected to solve this set of problems.Based on this background,imitation learning methods are investigated in this paper,and a new imitation learning algorithm is proposed for large-scale environments and complex tasks,using which a drone competition autonomous flight model is constructed and validated by simulation.The specific work is as follows:(1)The imitation learning algorithm and its implementation in a large-scale environment close to the real environment and a complex task with some practical significances are investigated,and an algorithm for the above environment and task is proposed.The algorithm uses generative adversarial imitation learning as a baseline and consists of three components: a high-dimensional states downscaling module,a policy network optimization module,the main policy network and discriminator network.(2)A high-dimensional states downscaling module is designed and implemented.The downscaling module includes a fixed matrix type downscaling module that uses the transform matrix of image features,an embedding like downscaling module that is based on trainable neural networks,and a fixed network type downscaling module that uses a pre-trained object detection network.(3)A policy network optimization module based on gate recurrent unit and attention mechanism is designed and the main network of the generator and discriminator is built.The gate recurrent unit is used to alleviate the problem of differences in occurrence time between states in large-scale environments and complex tasks,as well as the problem of information loss due to state downscaling.Meanwhile,to further balance the relationship between exploration and exploitation of agents in large-scale environments,the Action Differences Attention(ADA)algorithm based on the attention mechanism is proposed and combined with the gate recurrent unit as the optimization module of the policy network.Two sets of main policy networks are constructed based on the On-Policy algorithm with stable performance and the OffPolicy algorithm with high computational efficiency and data utilization efficiency.The main discriminator network is constructed by dense layers to improve the training speed.A model for Air Sim Drone Race Lab(ADRL)UAV race autonomous flight is constructed using the above algorithms,and the model is used to implement the race mission of Soccer Field Easy and the cross mission of Soccer Field Medium.In addition,the performance of multiple designs of the above modules is compared.First,the fixed network type combines the performance of the actual task with the stability of the training.The average pass rate of the fixed network type model over 1000 flight training sessions of the Soccer Field Easy competition task was 12.6 times higher than that of the fixed matrix type and 3.1 times higher than that of the embedding like type,while the variance of Kullback-Leibler divergence was only 57% of that of the embedding like type.Then,the model with the policy network optimization module improved the average successful cross rate by 70.6% compared to the model without the optimization module in 2000 flight training sessions of the Soccer Field Easy competition task.The average missed pass rate decreased by 18.8% and the average collision decreased by14.1%,the number of serious crashes decreased by 38.5%.In addition,the action difference attention structure gradually increased its sensitivity to action differences during flight training.Finally,the on-policy model performs much better than the offpolicy model,and the number of mini-batches is experimentally proven to be an important hyperparameter affecting the training stability.Overall,this paper presents and validates an imitation learning algorithm for largescale environments with complex tasks,proposes different designs for each module in the algorithm,and verifies the performance of these designs.The work in this paper is an inspiration and a guidance for subsequent practical applications of imitation learning at a deeper level.

Keywords/Search Tags:

Artificial Intelligence, Imitation Learning, Generative Adversarial Imitation Learning, Attention Mechanism, UAV automatic flight

PDF Full Text Request

Related items

1	Imitation Learning Based On Generative Adversarial Network
2	Study On The Generative Adversarial Imitation Learning Based On State Features
3	Research Of Robot Arm System With Imitation Learning Mechanism
4	Reinforcement Learning Agent Design Based On Deep Perception And Imitation Learning
5	Optimization And Evaluation Of Imitation Learning Algorithm Based On Gaussian Mixture Model
6	Optimization For Generative Modeling And Its Applications In Imitation Learning
7	Imitation Learning Based On Generative Adversarial Nets With Multiple Kinds Of Demonstrations
8	Research On Writing Imitation Based On Semantic Unit Substitution
9	The Robot’s Motor Learning By Imitation Based On Human-computer Interaction
10	Research On Reinforcement Learning Method For Game Manipulation Behavior Imitation