Font Size: a A A

Knowledge Sharing For Multi-agent Reinforcement Learning Via Teacher-student Paradigm

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZhuFull Text:PDF
GTID:2428330611467009Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL)has been widely used to solve sequential decision-making problems.However,RL algorithms suffer from poor sample efficiency and require a long time to learn a suitable policy,especially when agents learn without prior knowledge.This problem can be alleviated through reusing knowledge from other agents during the learning process.One notable approach is advising actions based on a teacher-student relationship,where the decision of a student agent during learning is aided by an experienced teacher agent.In cooperative multiagent reinforcement learning(MARL),agents need to learn jointly optimal policy rather than individually optimal policy.A student may fail to cooperate well with others even by following the teachers' suggested actions,as the polices of all agents are still changing before convergence.When the number of times that agents communicate with one another is limited(i.e.,there is budget constraint),the advising strategy that uses actions as advices may not be good enough.RL algorithms learn the optimal policy by estimating the cummulative discounted rewards(i.e.,Q-values)for each state-action pair,whereby an agent can perform the optimal actions associated with the maximum Q-values for every state.In current state,if a student can select next action based on a teacher's learned Q-values,then there is no need for him to take more time to learn for that state.Therefore,we propose a partakersharer advising framework(PSAF)for cooperative MARL agents learning under budget constraint.In PSAF,multiple decentralized Q-learners can share a limited number of Q-values to accelerate joint learning.In order to model communication cost,the number of times that each agent asks for Q-values and gives Q-values are constrained by two numeric budgets respectively.Then it is necessary for agents to choose proper time to share Q-values.The Q-values sharing framework requires that agents have similar even the same reward functions.By contrast,action advising only demands student and teacher to share common understanding of advised states and action space,which is more flexible in some situations,for example,agents have different policy representation.Besides,traditional teacher-student frameworks mainly address what and when to advise,while ignore the problem of how to use teacher's advice more effectively.Based on these observations,we propose some other methods to allow a student to choose between reusing previous advice and learning in usual teacherstudent framework.We conduct experiments on three classical multi-agent tasks for PSAF: Predator-Prey domain,Half Field Offense and Spread game.As for the evaluation of learning by reusing previous advice,we add a single-agent case Mario except for Predator-Prey and Half Field Offense.All experments show that our approaches significantly outperform existing advising method(without reusing advice).More importantly,our proposed method PSAF spends much lower budget than traditional teacher-student framework.
Keywords/Search Tags:reinforcement learning, multi-agent learning, teacher-student framework, knowledge sharing
PDF Full Text Request
Related items