Research On Online-Boosting Reinforcement Learning Algorithm Based On Prior Knowledge And Multi-Task Learning

Posted on:2023-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Liu

Full Text:PDF

GTID:2568306914471634

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Reinforcement learning,as an important part of artificial intelligence algorithms,has made significant contributions in multiple application areas.However,how to efficiently solve the dilemma of exploration and utilization in the sparse reward environment is still an important challenge for reinforcement learning and its improved algorithms.At present,the method of combining imitation learning with reinforcement learning is mainly adopted,but in practical applications,an expert system is often required to provide status retrieval functions in real time.Therefore,this paper proposes an online-boosting reinforcement learning algorithm based on prior knowledge and multi-task mechanism.The specific research content and results are as follows.(1)To study the algorithm architecture and simulation characteristics of the imitation learning method.For the unmanned vehicle countermeasure problem,mathematical modeling is carried out,and common behavior cloning and inverse reinforcement learning algorithms are analyzed experimentally to provide theoretical basis and experimental support for subsequent research,and the unmanned vehicle battle platform based on Unity engine is optimized for the shortcomings of the existing simulation environment,which has improved the physical simulation accuracy,simulation efficiency,visualization effect,and scalability.(2)Aiming at the inefficient utilization of expert data in sparse reward environment,a proximal policy optimization(PPO)algorithm based on prior knowledge and cross-entropy loss is proposed.Using the loss function of the cross-entropy loss optimization PPO algorithm,the weight index based on the truncation factor of the expert data,the optimization of the adaptive weighted guidance strategy and the determination of the data composition in the replay buffer are proposed;the original expert data set is supplemented in real time by using the success trajectories data of the agent,forming a dynamic implicit internal reward mechanism.Simulation results show that compared with BC,PPO and PPO+BC algorithms,the proposed method has advantages in convergence speed and strategy optimization.(3)Aiming at the problem that the training of generative adversarial imitation learning(GAIL)algorithm is unstable and the generation strategy is not diversified,a generative adversarial imitation learning algorithm based on multi-task mechanism is proposed.The generator network structure is improved by using asynchronous advantage methods,and the strategy network is centrally iterative strategy gradient for parallel training strategy networks for different auxiliary tasks.The binary classification reward mechanism is proposed to simplify the training process of the discriminator network,and the combination of zero-sum game thinking accelerates the convergence of the adversarial generative network to nash equilibrium.Simulation results show that the optimized generative adversarial network model has higher training stability and faster convergence speed.

Keywords/Search Tags:

proximal policy optimization, generative adversarial imitation learning, sparse reward, prior knowledge

PDF Full Text Request

Related items

1	Research On Dialog Generation Methods Based On Proximal Policy Optimization And Adversarial Learning
2	Study On The Generative Adversarial Imitation Learning Based On State Features
3	Research On The Generative Adversarial Learning Algorithms For Imitation Learning
4	Research And Implementation Of Reinforcement Learning Algorithm Based On Prior Knowledge
5	Imitation Learning Based On Generative Adversarial Network
6	Research And Implementation Of Imitation Learning For Complex Tasks In Large-scale Environments
7	Research On Robotic Arm Tracking And Grabbing Control Based On Fusion Reward PPO Algorithm
8	Regularized Maximum Entropy Imitation Learning Based On Prior Reward Of Trajectory
9	Researches On Generative Adversarial Imitation Learning For Multi-Agent Games
10	Research On Attribute Controllable Image Generation With Generative Adversarial Networks