Research On Reinforcement Learning Method Based On Intention Control

Posted on:2023-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:S Wu

Full Text:PDF

GTID:2568307058463714

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning(DRL)as an important learning method in the field of machine learning,it mainly studies how agents make better decisions in unknown environment.It is one of the most promising research fields to achieve the goal of artificial intelligence.At present,deep reinforcement learning has made great breakthroughs in many fields,such as games and robots.The goal of deep reinforcement learning is to find the optimal policy and obtain the maximum expected return.In order to achieve the goal of deep reinforcement learning,agents are required to understand the environmental state and make decision-making actions in line with the environmental situation according to the task requirements.The traditional policy model action selection mainly depends on state perception,historical memory and model parameters.In the final test and practical application,the behavior of the agent is difficult to control,the action is not standardized,and it is difficult to complete the expected task.However,when human agents want to complete a task,they usually consider their intention and motivation and choose the corresponding behavior according to the current situation and their will.In order to make the behavior selection mechanism in deep reinforcement learning closer to human beings and make the agent choose the behavior containing intention,this paper starts with the policy model,and designs a strong chemical habit model based on intention control according to the essential characteristics of intention control action when human beings perform tasks.Specifically,this paper designs a new objective function for intention control based reinforcement learning tasks,so that it can maximize the expected return at the same time,By maximizing mutual information(MI)between intention and action,intention variables are connected with action.Furthermore,the approximate value of mutual information target is derived,which can effectively solve the proposed objective function with intention control function.Finally,the effectiveness of the proposed policy model based on intention control is verified in the classical multi-objective continuous chain walking task and mujoco control task.

Keywords/Search Tags:

Reinforcement Learning, Mutual Information, Intentional Control, Proximal Policy Optimization, Data Mining

PDF Full Text Request

Related items

1	Robust Policy Gadient Algorithm Based On Actor-Critic In Deep Reinforcement Learning
2	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
3	Self Learning Control Of Mechanical Arm Based On Reinforcement Learning
4	Research On Dialog Generation Methods Based On Proximal Policy Optimization And Adversarial Learning
5	Research On Multi-Agent Collaboration Based On Value Decomposition And Proximal Policy Optimization
6	Research On Logic Synthesis Optimization Based On Reinforcement Learning
7	Research On Automatic Driving Control Decision Based On Deep Reinforcement Learning
8	Policy Adaptation With Contrastive Learning And Mutual Information In Meta Reinforcement Learning
9	Research On Robotic Arm Grabbing Method Based On Deep Reinforcement Learning
10	Robotic Intelligent Grasping Control Technology Based On Deep Reinforcement Learning