Font Size: a A A

Research On Intrinsic Rewards For Reinforcement Learnin

Posted on:2023-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2568306785464594Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional reinforcement learning only uses extrinsic reward as a signal to guide the agent to make decisions,so that the cumulative reward value in the future can reach the maximum value.The problem of extrinsic reward sparsity is a difficult problem in reinforcement learning.In the reward sparse environment,the agent can obtain the reward only when it reaches the end point,and the missing signal in the middle will lead to problems such as slow learning speed,high cost and underfitting of the policy network.In order to solve the problem of reward sparseness,this paper studies the intrinsic reward method of reinforcement learning agent.Aiming at the problems of low agent exploration efficiency and disappearance of intrinsic reward in the existing intrinsic reward method,the following research work has been completed:(1)The role of intrinsic reward is mainly to promote the agent to continuously explore the environment,but it will face unsafe actions during the exploration process.The current designed intrinsic reward does not consider the risk of actions in the environment.Therefore,this paper designs an intrinsic reward from both novelty and risk assessment,so that the agent can fully explore the environment and consider the uncertain actions existing in the environment.The method first describes the novelty as the number of visits to the current state-action and post-transition state,taking into account the specific actions performed;secondly,the cumulative reward variance is used to evaluate the risk degree of the current action to the state.The method was tested in discrete control tasks and continuous control tasks,and the experimental results verified that the method achieved higher average reward value,especially in the case of delayed extrinsic reward,also obtained higher average reward value,indicating that the method can effectively solve the problem of sparse extrinsic reward.(2)The existing intrinsic reward gradually disappears as the agent continues to explore the environment,so that the agent cannot use the intrinsic reward signal to learn the optimal strategy.To address this issue,a method for acquiring composable skills based on intrinsic rewards is proposed.The method firstly searches for a positive state in the interaction process between the agent and the environment,and selects the subgoals in the positive state;secondly,the skills are found in a trajectory generated from the initial state to the subgoal,and the sub-goal reaches the terminal state.Two or more subgoals are combined;finally,the skill is evaluated using the distance from the initial state to the subgoal and the cumulative reward value from the initial state to the subgoal.The method achieves a high average reward value in continuous control tasks,indicating that the subgoals and skills proposed by this method can effectively solve the problem that the agent cannot use the intrinsic reward signal to learn the optimal policy after the intrinsic reward disappears.
Keywords/Search Tags:Reinforcement learning, intrinsic reward, risk assessment, subgoal, skill
PDF Full Text Request
Related items