Research And Implementation Of Reinforcement Learning Algorithm Based On Prior Knowledge

Posted on:2024-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:G J Li

Full Text:PDF

GTID:2568307079471194

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Reinforcement learning(RL),as an algorithm framework for solving sequential decision problems,has been widely used in many popular fields such as automatic driving,robot control and game confrontation.It learns policy by trial and error,and performs better than humans in some decision-making tasks.However,due to problems such as sparse rewards,low sample utilization rate and environmental overfitting,RL is inefficient in some scenarios and difficult to be applied in practice.In recent years,more and more researchers try to introduce prior knowledge into RL to help agent obtain additional guidance and realize efficient learning,which is one of the hot topics in RL research.This paper focuses on two types of prior knowledge,demonstrations and prior policies,and puts forward a method of its integration in RL respectively,which realizes the transfer of prior knowledge from human and agent to target agent.Specifically,the research work of this paper mainly includes the following three parts.Firstly,a RL algorithm based on demonstration integration is proposed.Aiming at the RL task with sparse rewards,this paper establishes the connection between agent and demonstrations by means of distribution matching,and gives agent additional guidance with the method of shaping,so as to encourage agent to imitate the demonstrated actions.At the same time,in view of the limitations brought by suboptimal demonstrations for policy learning,this paper introduces the maximum entropy mechanism,and designs a skipping mechanism for demonstrations,which effectively prevents policy learning from converging to local optimal.Secondly,a RL algorithm based on prior policies is proposed.In this paper,the distillation of priori policy is combined with the self-learning of target agent,and priori policy is also used to help target agent select actions,which makes it can obtain excellent initial performance while learning the reward distribution quickly.In addition,the prioritized experience replay mechanism is introduced in the method.When prior policy fails in some states of target environment,agent can corrected it quickly.Finally,a 3D maze RL system which can integrate prior knowledge is designed and implemented.Based on the Unity game development engine,a RL system with 3D maze experimental environment is designed and implemented in this paper,and the algorithm proposed before is deployed and verified in it.At the same time,the system supports researchers to import prior knowledge for the RL model,and can also export and store trained data as prior knowledge,providing a convenient experimental environment for researchers engaged in related research.

Keywords/Search Tags:

Reinforcement Learning, Prior Knowledge, Demonstration, Reward Shaping, Prior Policy

PDF Full Text Request

Related items

1	Research On Online-Boosting Reinforcement Learning Algorithm Based On Prior Knowledge And Multi-Task Learning
2	Research On Deep Reinforcement Learning Based On Prior Knowledge Extraction
3	Reinforcement Learning Control Methods Based On Prior Knowledge Model:Studies And Implementation
4	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
5	Research And Application Of Reward Shaping Based Reinforcement Learning
6	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
7	Q-learning Potential Reward Online Learning Technology Inspired By Priori Knowledge
8	Using Task Prior In Reinforcement Learning Exploration
9	Image Dehazing Algorithm Based On Two Prior Knowledge
10	Research On Motion Control Method Of Quadruped Robot Based On Deep Reinforcement Learning