Font Size: a A A

Research On Online-Boosting Reinforcement Learning Algorithm Based On Prior Knowledge And Multi-Task Learning

Posted on:2023-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiuFull Text:PDF
GTID:2568306914471634Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning,as an important part of artificial intelligence algorithms,has made significant contributions in multiple application areas.However,how to efficiently solve the dilemma of exploration and utilization in the sparse reward environment is still an important challenge for reinforcement learning and its improved algorithms.At present,the method of combining imitation learning with reinforcement learning is mainly adopted,but in practical applications,an expert system is often required to provide status retrieval functions in real time.Therefore,this paper proposes an online-boosting reinforcement learning algorithm based on prior knowledge and multi-task mechanism.The specific research content and results are as follows.(1)To study the algorithm architecture and simulation characteristics of the imitation learning method.For the unmanned vehicle countermeasure problem,mathematical modeling is carried out,and common behavior cloning and inverse reinforcement learning algorithms are analyzed experimentally to provide theoretical basis and experimental support for subsequent research,and the unmanned vehicle battle platform based on Unity engine is optimized for the shortcomings of the existing simulation environment,which has improved the physical simulation accuracy,simulation efficiency,visualization effect,and scalability.(2)Aiming at the inefficient utilization of expert data in sparse reward environment,a proximal policy optimization(PPO)algorithm based on prior knowledge and cross-entropy loss is proposed.Using the loss function of the cross-entropy loss optimization PPO algorithm,the weight index based on the truncation factor of the expert data,the optimization of the adaptive weighted guidance strategy and the determination of the data composition in the replay buffer are proposed;the original expert data set is supplemented in real time by using the success trajectories data of the agent,forming a dynamic implicit internal reward mechanism.Simulation results show that compared with BC,PPO and PPO+BC algorithms,the proposed method has advantages in convergence speed and strategy optimization.(3)Aiming at the problem that the training of generative adversarial imitation learning(GAIL)algorithm is unstable and the generation strategy is not diversified,a generative adversarial imitation learning algorithm based on multi-task mechanism is proposed.The generator network structure is improved by using asynchronous advantage methods,and the strategy network is centrally iterative strategy gradient for parallel training strategy networks for different auxiliary tasks.The binary classification reward mechanism is proposed to simplify the training process of the discriminator network,and the combination of zero-sum game thinking accelerates the convergence of the adversarial generative network to nash equilibrium.Simulation results show that the optimized generative adversarial network model has higher training stability and faster convergence speed.
Keywords/Search Tags:proximal policy optimization, generative adversarial imitation learning, sparse reward, prior knowledge
PDF Full Text Request
Related items