Font Size: a A A

Research On Exploration Strategy In Deep Reinforcement Learning

Posted on:2022-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:M K LiFull Text:PDF
GTID:2518306533472404Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of the important branches of machine learning,and it is a kind of method to find the optimal policy in the interaction between the agent and the environment.Deep reinforcement learning combines the perception ability of deep learning and the decision-making ability of reinforcement learning,and has been widely used in complex decision-making problems such as robot control.Deep reinforcement learning enables agents to act and learn in complex environments,but it also introduces new challenges in the exploration and exploitation of reinforcement learning,especially for tasks with continuous actions.For solving this problem,the main research contents of this thesis is:(1)Aiming at the problem that existing exploration strategy cannot be effectively explored in continuous action tasks,an exploration strategy based on a count value function is proposed.First,the action count value function and the state count value function are designed,the action count value function is used to improve the policy,and the state count value function is used to control the range of action selection.In addition,this paper analyzes the three factors that affect the update of the value function,and proposes a calculation method of the target value based on the value update path,which improves the stability of the update of the value function.(2)Aiming at the problem that the existing exploration strategies cannot perform efficient exploration in a sparse reward environment,an exploration strategy based on a generative model is proposed.Firstly,it analyzes the feasibility of using conditional generative adversarial network modeling environment,where the generator is used as a predictive model of the environment;secondly,in order to improve the stability of the generation model,the generator network structure and loss function are improved,and the network of the predicted value function has been added to the generator;then,use the generative model designs a reasonable internal reward function;finally,the reconstructed reward is used to guide the agent's exploration.(3)A well-designed reward function can help the agent quickly find the optimal policy.There are many methods for improving the reward function,but the artificially designed reward function may not improve the performance of the reinforcement learning algorithm.For this reason,this paper proposes to use the intrinsic reward function to improve the exploration strategy.First,the adaptive coefficient between the extrinsic reward and the entropy of the policy is designed;then,the parameters of the intrinsic reward function are updated by maximizing the external reward and the entropy of the policy;finally,Use the modified reward function to optimize the policy.The results on the Gym and Mu Jo Co experimental platforms show that compared with the current mainstream deep reinforcement learning exploration algorithms,the algorithm proposed in this article can obtain higher returns in a shorter time.The paper contains 17 figures,12 tables,and 80 references.
Keywords/Search Tags:deep reinforcement learning, exploration, value function, intrinsic reward
PDF Full Text Request
Related items