Font Size: a A A

Research And Implementation Of Imitation Learning For Complex Tasks In Large-scale Environments

Posted on:2022-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q H LiuFull Text:PDF
GTID:2518306773985179Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
The practical applications of AI technology are maturing due to the increase in computing power brought by advances in microelectronics and the continuous development of machine learning algorithm theory.However,traditional machine learning methods based on supervised learning.It is difficult to get an agent can perform continuous decision-making tasks in real world environments such as autonomous driving and autonomous robot control independently.Reinforcement learning aims to solve continuous decision-making problems in the framework of Markov Decision Process to get a reliable agent,but the application of reinforcement learning in largescale environments and complex tasks is difficult due to the limitations of the reward function.Imitation learning uses expert demonstration inplace reward function,it is expected to solve this set of problems.Based on this background,imitation learning methods are investigated in this paper,and a new imitation learning algorithm is proposed for large-scale environments and complex tasks,using which a drone competition autonomous flight model is constructed and validated by simulation.The specific work is as follows:(1)The imitation learning algorithm and its implementation in a large-scale environment close to the real environment and a complex task with some practical significances are investigated,and an algorithm for the above environment and task is proposed.The algorithm uses generative adversarial imitation learning as a baseline and consists of three components: a high-dimensional states downscaling module,a policy network optimization module,the main policy network and discriminator network.(2)A high-dimensional states downscaling module is designed and implemented.The downscaling module includes a fixed matrix type downscaling module that uses the transform matrix of image features,an embedding like downscaling module that is based on trainable neural networks,and a fixed network type downscaling module that uses a pre-trained object detection network.(3)A policy network optimization module based on gate recurrent unit and attention mechanism is designed and the main network of the generator and discriminator is built.The gate recurrent unit is used to alleviate the problem of differences in occurrence time between states in large-scale environments and complex tasks,as well as the problem of information loss due to state downscaling.Meanwhile,to further balance the relationship between exploration and exploitation of agents in large-scale environments,the Action Differences Attention(ADA)algorithm based on the attention mechanism is proposed and combined with the gate recurrent unit as the optimization module of the policy network.Two sets of main policy networks are constructed based on the On-Policy algorithm with stable performance and the OffPolicy algorithm with high computational efficiency and data utilization efficiency.The main discriminator network is constructed by dense layers to improve the training speed.A model for Air Sim Drone Race Lab(ADRL)UAV race autonomous flight is constructed using the above algorithms,and the model is used to implement the race mission of Soccer Field Easy and the cross mission of Soccer Field Medium.In addition,the performance of multiple designs of the above modules is compared.First,the fixed network type combines the performance of the actual task with the stability of the training.The average pass rate of the fixed network type model over 1000 flight training sessions of the Soccer Field Easy competition task was 12.6 times higher than that of the fixed matrix type and 3.1 times higher than that of the embedding like type,while the variance of Kullback-Leibler divergence was only 57% of that of the embedding like type.Then,the model with the policy network optimization module improved the average successful cross rate by 70.6% compared to the model without the optimization module in 2000 flight training sessions of the Soccer Field Easy competition task.The average missed pass rate decreased by 18.8% and the average collision decreased by14.1%,the number of serious crashes decreased by 38.5%.In addition,the action difference attention structure gradually increased its sensitivity to action differences during flight training.Finally,the on-policy model performs much better than the offpolicy model,and the number of mini-batches is experimentally proven to be an important hyperparameter affecting the training stability.Overall,this paper presents and validates an imitation learning algorithm for largescale environments with complex tasks,proposes different designs for each module in the algorithm,and verifies the performance of these designs.The work in this paper is an inspiration and a guidance for subsequent practical applications of imitation learning at a deeper level.
Keywords/Search Tags:Artificial Intelligence, Imitation Learning, Generative Adversarial Imitation Learning, Attention Mechanism, UAV automatic flight
PDF Full Text Request
Related items