In recent years,deep reinforcement learning techniques have made great progress in optimizing intelligent agent behavior in complex environments.The chatbot Chat GPT released by Open AI this year is a typical example.It uses the PPO algorithm of deep reinforcement learning to fine-tune GPT-3 and shows impressive conversational skills.However,in many complex domains similar to chatbots,applying deep reinforcement learning is a challenging and costly task.Therefore,in order to further promote the application of deep reinforcement learning techniques in complex environments in various fields,researchers need to use simulation environments to provide rich and diverse training scenarios.And games are an ideal low-cost simulation environment,where MOBA games are even more natural experimental venues for studying reinforcement learning landing because of their high complexity,dynamic changes,multi-agent collaboration and other characteristics.Its complexity can provide a strong reference and reference significance for applying reinforcement learning to other fields.Based on this,thesis mainly explores the AI training problem of deep reinforcement learning on MOBA games and how to improve AI’s model capabilities.In the research of MOBA game AI,thesis focuses on two important problems of MOBA game AI training and proposes corresponding improvement methods.one is that fictitious self-play can easily lead to strategy degradation,the other is how to efficiently train AI with limited training resources.For the first problem,thesis proposes an improved algorithm based on Open AI FIve’s heuristic fictitious self-play algorithm.By adding a policy diversity evaluation indicator and modifying the scoring function,it improves the model’s training effect and further improves the power of model.The algorithm was experimentally verified on 3v3 snake game and achieved good results.For the second problem,thesis takes King Glory Changping Attack Defense War 3v3 mode as an experimental environment to explore the challenges and opportunities faced by reinforcement learning on MOBA game AI.This paper aims at some problems such as temporal memory difficult to train,multi-agent communication interference,impact of AI training stage switching on value network and trade-off between network depth and AI training overfitting etc,proposes hybrid design of fully connected and recurrent neural network based on multi-head attention mechanism adjustable communication mechanism based on selfplay learning multi-agent value network decoupling design method and PSCN structure improvement scheme etc,and elaborates on importance of phased reward function design temporal reward design for improving performance of MOBA game AI etc.After these improvements King Glory 3v3 model’s AI can finally reach human player’s middle level with less training resources. |