Font Size: a A A

A Research Of Deep Reinforcement Learning Algorithms In Combination With Multi-relations

Posted on:2021-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GengFull Text:PDF
GTID:2518306476952969Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deep Reinforcement Learning(DRL)combines the perceptual of deep learning with the decision-making ability of reinforcement learning to address complex decision-making problems that are difficult to solve by traditional reinforcement learning.However,the deep reinforcement learning models still have limitations such as low sample efficiency and insufficient generalizability of small changes in the environment.These limitations are mainly reflected in the fact that deep reinforcement learning tends to overfit training data.Thus,it fails to solve problems with complex logical and relational structures.To deal with these problems,Deep Mind proposed a novel Relational Reinforcement Learning(RRL)algorithm in '18.The algorithm advocates learning and reusing entities-and relations-centered functions,which implies relational reasoning.The algorithm greatly improves the performances,sample efficiency,and the capability of induction.Relational reasoning is essential for the success of RRL models whose computations focus explicitly on binary relations of entity pairs.In the real world,multiple relations are more indicative of the way that entities are connected.Despite the increasing numbers of considerable entities in relational reasoning that would benefit reinforcement learning,it comes with a high cost of exponential growth in computation complexity.Therefore,computing multiple relations while maintaining computational efficiency is a huge challenge.On the other hand,deep reinforcement learning in the multi-agent environment will face more difficult challenges than in a single-agent environment,including the non-stationary and multi-agent credit assignment.In the field of multi-agent,there is no relevant research on RRL.In summary,this paper proposes the corresponding multi-relational DRL algorithm in the field of single-agent and multi-agent,in response to the above challenges.In a single-agent environment,the current RRL model gives agents the ability to reason through the use of a multi-head dot product attention mechanism(MHDPA).However,this mechanism only considers binary relations and it is difficult to extend to multiple relations.This paper proposes a new relational module called the Multiple Relational Core(MRC)and combines it with the classical A3 C algorithm to form a complete single-agent multiple relations actor-critic(MRAC)algorithm.MRC has two main characteristics: multiple relations were efficiently approximated;the computational complexity of the module will not grow exponentially with the increasing of depth degree.To further reduce the computational complexity of MRC,by optimizing the internal structure of the model,we propose a single-headed simplified version of the multiple relational core(MRCSH).MRCSH can reduce the overhead of computation while preserving the capacity of relational reasoning.In a multi-agent environment,a novel multi-agent multi-relational actor-critic algorithm is proposed based on the single-agent algorithm Soft Actor-Critic(SAC)in this paper.A multi-relational mechanism is shared among all agent's critics.The mechanism dynamically selects the agent that needs attention by calculating the relations between agents,which enhances the ability of communication and cooperation,and thus improving the algorithm performance.In addition,the MMRAC algorithm further enhances the stability of the model by introducing a new multi-agent advantage function,which helps to solve the problem of multi-agent credit assignment.MMRAC is flexible enough to be used in most multi-agent learning tasks.Finally,the effectiveness of MRAC algorithm and MMRAC algorithm is verified in different game environments.In single-agent environment,experimental results show that MRAC algorithm not only learn better strategies,but also improve the sample efficiency and the generalization ability of the model.At the same time,compared with MRC module,MRCSH can maintain the performance of the model and reduce the time of convergence.In multi-agent environment,the performance of MMRAC algorithm is better than other compared algorithms,and has better robustness and scalability.
Keywords/Search Tags:Reinforcement Learning, Relational Learning, Deep Learning, Artificial Intelligence
PDF Full Text Request
Related items