Font Size: a A A

The Design And Implementation Of The GPU Resource Management Component In Transwarp Container Platform

Posted on:2022-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:C M GongFull Text:PDF
GTID:2518306725984029Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As the container cloud technology represented by Docker and Kubernetes in cloud computing becomes mature,many enterprises start to containerize applications and use Kubernetes to manage containerized applications,including AI applications.However,in the current container cloud technology,the virtualization and management of GPU resources are at an initial stage.AI applications,especially deep learning applications need to leverage GPUs to accelerate computation,which brings challenges to deploying AI applications in Kubernetes.The Kubernetes system currently manages GPU resources through a device plugin provided by NVIDIA,allowing workloads deployed in Kubernetes to use GPU resources.However,this approach only supports the exclusive use of physical GPU devices by containers which brings two problems:(1)underutilized GPU resources because of those tasks which consume few GPU resources,such as model inference;(2)increased average task latency caused by high concurrency where the GPU devices of customers are limited.To solve the above problems,this thesis suggests integrating GPU sharing capability in Kubernetes so that multiple workloads can share physical GPUs,thereby increasing GPU utilization and reducing average task latency.This thesis proposed a solution called Krux which leverages the extensibility mechanism offered by Kubernetes to design how GPUs can be used and shared in Kubernetes.Based on this solution,a GPU resource management module is implemented,which can be divided into four functional modules,namely GPU device plugin module,GPU scheduling plugin module,container runtime module,and resource limitation module.The GPU device plugin module is developed based on the Kubernetes device plugin mechanism,which is used to monitor GPU devices on worker nodes and report the virtualized GPU resources to the Kubernetes cluster.The GPU scheduling plugin module is developed based on the Kubernetes scheduling framework and it works with the default scheduler to make scheduling decisions for GPU workloads.The resource limitation module is responsible for intercepting the CUDA driver API and adding additional control logic to limit the GPU resources used by CUDA applications.The container runtime module makes some customized modifications to the official NVIDIA container runtime and serves as an assistant module to connect the two modules upstream of container creation with the GPU resource limit module.The GPU resource management component has been integrated into Transwarp's container platform version 3.0.The big data and AI platforms supported by the platform have begun to widely use GPU sharing in the latest development version,effectively improving GPU utilization and reducing average task latency,which helps companies reduce the cost of hardware resources.
Keywords/Search Tags:Cloud Computing, Deep learning, Container, Kubernetes, GPU Sharing
PDF Full Text Request
Related items