Font Size: a A A

Design And Development Of A Distributed Platform For Competitive Self-play Based Reinforcement Learning

Posted on:2024-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:W L MiaoFull Text:PDF
GTID:2568306914961599Subject:Communication Engineering (including broadband network, mobile communication, etc.) (Professional Degree)
Abstract/Summary:PDF Full Text Request
Competitive Self-Play Based Reinforcement Learning(CSP-RL),especially multi-agent reinforcement learning,has received remarkable results in many complex large-scale game environments,such as Dota 2,StarCraft II,and King Honor.From the perspective of game theory,Self-Play based method searches for Nash Equilibrium(NE)through virtual self games,which is universal for competitive environment.Compared to the traditional deep learning algorithm development process,the research and development of deep reinforcement learning algorithms in competitive self-play environment will become more complex,placing higher requirements on the engineering practice level of algorithm researchers.In recent years,the concept of MLOps has gradually emerged,and MLOps is the DevOps of machine learning.Its main function is to establish a standardized model development,deployment,and operation process.Currently,the academic community does not have a full stack solution platform for CSP-RL scenarios to cover the entire process of algorithm development.In order to reduce the difficulty of algorithm development and testing in competitive reinforcement learning scenarios,inspired by MLOps,this paper designs and implements a distributed algorithm development platform for CSPRL to support the entire process of algorithm development in a competitive reinforcement learning environment in a multi user scenario.The full stack platform designed in this article provides a full process solution for algorithm development environment construction,resource allocation,model hosting,performance evaluation,and scalable reinforcement learning distributed training in multi user scenarios.This platform is based on Kubernetes cloud native implementation and has excellent scalability and observability.Compared to the general MLOps platform,it has implemented many optimization designs for CSP-RL scenarios to simplify the development of reinforcement learning algorithms.In order to further demonstrate the platform’s capabilities,based on the distributed training system of this platform,this paper designs and implements a high-performance CSP-RL algorithm in Pommerman’s 2v2 competitive game environment,and conducts detailed algorithm design and performance testing in this scenario.The experimental results show that the agent trained by the algorithm proposed in this article exhibits very fast learning ability against the benchmark agent,achieving a winning rate of over 80%after 600 training iterations.At the same time,the distributed training system proposed in this article also has excellent scalability.In the training process of Pommerman environment,this system demonstrated a nearly linear acceleration ratio within a scale of 100 CPU cores.The algorithm implementation case in the Pommerman scenario not only demonstrates the ability of CSP-RL to solve complex game problems,but also fully demonstrates the efficient scalability of this platform architecture for reinforcement learning distributed training.The algorithm development platform proposed in this paper can not only improve the efficiency of CSP-RL algorithm development,training,and iteration,but also promote algorithm researchers to quickly evaluate and improve algorithms.It is also beneficial for algorithm beginners to learn and implement the CSP-RL core algorithm more efficiently without paying attention to complex distributed engineering implementations.
Keywords/Search Tags:reinforcement learning, algorithm platform, self play, distributed training
PDF Full Text Request
Related items