| In recent years,deep reinforcement learning research has made considerable progress,and progress has been made in many fields such as games,robots,and Go,which has made the industry and academia pay more attention to deep reinforcement learning algorithms.With the continuous research of deep reinforcement learning algorithms,practitioners and researchers have an increasing demand for a convenient and efficient reinforcement learning platform.Even in recent work,research institutions and technology giants have proposed many advanced reinforcement learning platforms(such as Xing Tian,RLlib,etc.).However,there is a lack of horizontal comparison between platforms.The platform generally has the problem of low resource usage,which makes the deep reinforcement training speed of learning jobs slow.It is very unfriendly to the submitters of training assignments,clusters deployed with reinforcement learning platforms,and cluster owners.Through the comparative analysis of the current popular reinforcement learning platforms,this paper discovers the deficiencies in the training part and resource utilization of the reinforcement learning platform represented by the Xing Tian platform developed by Huawei Noah Labs.Furthermore,we propose a training part based on communication chain optimization and design a resource optimization approach.The approach uses the hierarchical filtering of interference and communication.Then it proposes a performance predicting method based on the characteristic of RL applications.Moreover,accurately predicting RL performances can facilitate efficient job deployment on the reinforcement learning platform and fully use cluster resources.We use mainstream deep RL algorithms to evaluate the proposed optimization strategies(XingTian architecture and resource scheduling optimization)in representative simulation environments.The evaluation results demonstrate that,while ensuring the reward value of the reinforcement learning job,the throughput,CPU utilization,and GPU utilization of the system are improved 49.2%,38.6%,26.9%,respectively. |