Recent advance in Deep Reinforcement Learning(DRL)has obtained expressive success of achieving human-level control in complex tasks.However,there still exists some limitations and challenges in how to effectively improve the learning efficiency of DRL.First,DRL is still faced with the problem of sample inefficiency,making it difficult to learn from scratch especially on complex and large-scale problems.Second,deep Multiagent Reinforcement Learning(MARL)algorithms are faced with the curse of dimensionality with the increase in the number of agents,thus the problem of the sample inefficiency is much more severe.Finally,in multiagent systems,the behaviors of agents are contingent on coexisting agents.Furthermore,multiple agents may exhibit sophisticated behaviors,making it more difficult to accurately predict the behaviors of other agents,and make a best response accordingly.To this end,this thesis proposes three solutions to leverage transfer learning to facilitate efficient DRL and MARL from three research directions: single-agent policy transfer,multiagent policy transfer in cooperative environments and multiagent efficient response in competitive or cooperative environments.The main contents of this thesis are as follows:First,DRL is still faced with the sample inefficiency problem especially when the state-action space becomes large,which makes it difficult to learn from scratch.To address this problem,we propose a novel Policy Transfer Framework(PTF)by modeling multi-policy transfer as an option learning problem to determine when and which source policy is the best to reuse for the target policy and when to terminate it.Besides,we propose an adaptive and heuristic mechanism to ensure the efficient reuse of source policies and avoid negative transfer.Both existing value-based and policy-based DRL approaches can be incorporated and experimental results show PTF significantly boosts the performance of existing DRL approaches,and outperforms state-of-the-art policy transfer methods both in discrete and continuous action spaces.Secondly,this thesis considers the dual problems of the sample inefficiency and high environmental complexity in cooperative multiagent learning,and proposes a novel Multiagent Option-based Policy Transfer(MAOPT)framework to improve MARL efficiency and coordination.MAOPT learns what advice to provide and when to terminate it for each agent by modeling multiagent policy transfer as the option learning problem.To handle the problem of the inconsistency in each agent’s experience caused by the partial observability,we propose the successor representation option learning to decouple the environment dynamics from rewards and learn the option-value under each agent’s preference.Experimental results show MAOPT significantly boosts the performance of existing methods in both discrete and continuous state spaces.Finally,in multiagent environments,the ideal behavior of an agent is contingent on the behaviors of coexisting agents.However,agents may exhibit different behaviors adaptively depending on the contexts they encounter.Hence,it is critical for an agent to quickly predict or recognize the behaviors of other agents,and make a best response accordingly.To solve this problem,this thesis proposes a novel approach called BayesTo Mo P which can efficiently predict the strategy of opponents using either stationary or higher-level reasoning strategies.Bayes-To Mo P also supports the detection of previously unseen policies and learning a best-response policy accordingly.Experimental results show both Bayes-To Mo P and deep Bayes-To Mo P outperform the state-of-the-art approaches when faced with different types of opponents in various games.In summary,this thesis aims to improve the learning efficiency of DRL and deep MARL algorithms through transfer learning,and investigates how to achieve this goal deeply by solving three challenges: sample inefficiency in DRL algorithms,dual-challenges of sample inefficiency and high environmental complexity in deep MARL and fast detection and best response towards sophisticated opponents in multiagent environments.All researches have been validated through extensive experiments.This research contributes to improving the development of DRL and its application in practical domains.Furthermore,it plays an important role in improving multiagent cooperation and collaboration,extending deep MARL algorithms to large-scale MASs. |