Font Size: a A A

Research And Implementation Of Reinforcement Learning Algorithm Based On Prior Knowledge

Posted on:2024-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:G J LiFull Text:PDF
GTID:2568307079471194Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Reinforcement learning(RL),as an algorithm framework for solving sequential decision problems,has been widely used in many popular fields such as automatic driving,robot control and game confrontation.It learns policy by trial and error,and performs better than humans in some decision-making tasks.However,due to problems such as sparse rewards,low sample utilization rate and environmental overfitting,RL is inefficient in some scenarios and difficult to be applied in practice.In recent years,more and more researchers try to introduce prior knowledge into RL to help agent obtain additional guidance and realize efficient learning,which is one of the hot topics in RL research.This paper focuses on two types of prior knowledge,demonstrations and prior policies,and puts forward a method of its integration in RL respectively,which realizes the transfer of prior knowledge from human and agent to target agent.Specifically,the research work of this paper mainly includes the following three parts.Firstly,a RL algorithm based on demonstration integration is proposed.Aiming at the RL task with sparse rewards,this paper establishes the connection between agent and demonstrations by means of distribution matching,and gives agent additional guidance with the method of shaping,so as to encourage agent to imitate the demonstrated actions.At the same time,in view of the limitations brought by suboptimal demonstrations for policy learning,this paper introduces the maximum entropy mechanism,and designs a skipping mechanism for demonstrations,which effectively prevents policy learning from converging to local optimal.Secondly,a RL algorithm based on prior policies is proposed.In this paper,the distillation of priori policy is combined with the self-learning of target agent,and priori policy is also used to help target agent select actions,which makes it can obtain excellent initial performance while learning the reward distribution quickly.In addition,the prioritized experience replay mechanism is introduced in the method.When prior policy fails in some states of target environment,agent can corrected it quickly.Finally,a 3D maze RL system which can integrate prior knowledge is designed and implemented.Based on the Unity game development engine,a RL system with 3D maze experimental environment is designed and implemented in this paper,and the algorithm proposed before is deployed and verified in it.At the same time,the system supports researchers to import prior knowledge for the RL model,and can also export and store trained data as prior knowledge,providing a convenient experimental environment for researchers engaged in related research.
Keywords/Search Tags:Reinforcement Learning, Prior Knowledge, Demonstration, Reward Shaping, Prior Policy
PDF Full Text Request
Related items