Font Size: a A A

Research On Robot Behavior Control Based On Deep Reinforcement Learning

Posted on:2022-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z DongFull Text:PDF
GTID:2518306509984499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning(DRL)is one of the important technologies to realize the behavior control of autonomous robot.In the dynamic environment,the learning process of robot control often falls into local optimum due to the high-dimensional continuous decision space.In addition,deep learning follows the "end-to-end" training concept and lacks the explanation of the internal dependence of robots.The main research work is summarized as follows:First,this thesis proposes a Decomposed Actor-Critic with Attentional Graph Neural Networks(DAC-AGNN)to solve the curse of dimensionality in robot.DAC-AGNN uses the Structure Decomposition model to divide the high-dimensional robot into several low dimensional sub-joints,which effectively reduces the decision space of the robotic.Then,the AGNN model is used to obtain the weight of each agent and the DAC model is used to train the robot.Therefore,each agent can manage resources and learn independently.Second,this thesis proposes a Decomposed Deep Deterministic Policy Gradient(D3PG)to solve the collaboration between internal joints of a robotic.D3 PG uses two Collaboration Graph(CG)models(ATTENTION and PODT)to obtain the weight of each agent and realize the trust allocation of the internal joints.In addition,the mechanism of centralized execution and distributed training is used to realize the collaboration control of multiple joints in the robot.Third,this thesis also proposes a Structure-Motivated Interactive Learning(SMILE)to solve the interpretability and communication between the internal joints.SMILE obtains the internal dependency of the robot through the CG model,so as to define different Interaction of Degree(Do I).Then,the Enhanced State of each agent integrates the information of other joints,so communication between different joints can be realized.What's more,two improved CG models,APODT and PATTENTION,use the policy gradient of global robot to guide the learning of the Do I.The experimental results show that the performance of DAC-AGNN is significantly better than the traditional Actor-Critic,which proves the effectiveness of structural decomposition.However,DAC-AGNN is easy to fall into local optimum.In contrast,D3PG-PCG has highest reward value than the existing DRL algorithms,which proves the cooperation of internal joint is helpful for the robot to explore the optimal strategy.In addition,D3 PG can also obtain the importance of each joint of the robot in different postures.The experiment of SMILE also show that different Do I algorithms have significant difference,which proves that the dynamic information interaction can significantly promote the learning of a robot.In addition,SMILE can obtain the interdependency between the internal joints in any postures.
Keywords/Search Tags:Deep Reinforcement Learning, Robot Control, Structure Decomposition, Collaboration, Degree of Interaction
PDF Full Text Request
Related items