Font Size: a A A

Research On Reinforcement Learning Method Based On Intention Control

Posted on:2023-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2568307058463714Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning(DRL)as an important learning method in the field of machine learning,it mainly studies how agents make better decisions in unknown environment.It is one of the most promising research fields to achieve the goal of artificial intelligence.At present,deep reinforcement learning has made great breakthroughs in many fields,such as games and robots.The goal of deep reinforcement learning is to find the optimal policy and obtain the maximum expected return.In order to achieve the goal of deep reinforcement learning,agents are required to understand the environmental state and make decision-making actions in line with the environmental situation according to the task requirements.The traditional policy model action selection mainly depends on state perception,historical memory and model parameters.In the final test and practical application,the behavior of the agent is difficult to control,the action is not standardized,and it is difficult to complete the expected task.However,when human agents want to complete a task,they usually consider their intention and motivation and choose the corresponding behavior according to the current situation and their will.In order to make the behavior selection mechanism in deep reinforcement learning closer to human beings and make the agent choose the behavior containing intention,this paper starts with the policy model,and designs a strong chemical habit model based on intention control according to the essential characteristics of intention control action when human beings perform tasks.Specifically,this paper designs a new objective function for intention control based reinforcement learning tasks,so that it can maximize the expected return at the same time,By maximizing mutual information(MI)between intention and action,intention variables are connected with action.Furthermore,the approximate value of mutual information target is derived,which can effectively solve the proposed objective function with intention control function.Finally,the effectiveness of the proposed policy model based on intention control is verified in the classical multi-objective continuous chain walking task and mujoco control task.
Keywords/Search Tags:Reinforcement Learning, Mutual Information, Intentional Control, Proximal Policy Optimization, Data Mining
PDF Full Text Request
Related items