Research On Interference-aware GPU Resource Provisioning For Predictable DNN Inference

Posted on:2022-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:J N Xu

Full Text:PDF

GTID:2518306776492894

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

With the burgeoning demand for latency-sensitive artificial intelligence(AI)-based computation,GPUs are essential to accelerating deep neural network(DNN)inference workloads in cloud datacenters.The traditional exclusive and temporal sharing of GPUs to execute DNN inference workloads can intrinsically result in GPU resource wastage.To fully utilize the GPU resources,spatial sharing of GPUs among co-located DNN inference workloads becomes increasingly compelling.Motivated by our empirical measurement study of DNN inference executed on Amazon EC2 GPU instances,we find that the performance interference among co-located inference workloads is noticeable.Through an in-depth analysis of motivation experiment results,we further identify the root causes of such interference as the severe contention of the GPU scheduler and GPU L2 cache space as well as the GPU power consumption.While existing works on guaranteeing performance Service Level Objectives(SLOs)of DNN inference focus on either temporal sharing of GPUs or reactive GPU resource scaling and inference migration techniques,how to proactively mitigate such severe performance interference has received comparatively little attention.To fill this gap,this thesis proposes i Gniter,an interference-aware GPU resource provisioning framework for cost-efficiently achieving predictable DNN inference in the cloud.Specifically,i Gniter comprises of two key components:(1)a lightweight DNN inference performance model,which leverages the system and workload metrics that are practically accessible to explicitly capture the performance interference with different batch sizes and GPU resources and can accurately predict DNN inference performance;(2)A cost-efficient GPU resource provisioning strategy that jointly optimizes the GPU resource allocation and adaptive batching based on our inference performance model,with the aim of achieving predictable performance of DNN inference workloads.We implement a prototype of i Gniter based on NVIDIA Triton inference server on Amazon EC2 GPU instances.Extensive prototype experiments on four representative DNN models and datasets demonstrate that i Gniter can guarantee the DNN inference performance SLOs,while saving the monetary cost by up to 25% in comparison to the stateof-the-art GPU resource provisioning strategies,yet with an acceptable runtime overhead.

Keywords/Search Tags:

GPU resource provisioning, cloud-based DNN inference, predictable performance, performance interference

PDF Full Text Request

Related items

1	Research On Cost-Efficient Cloud Resource Provisioning For Predictable Deep Neural Network Training
2	Towards Predictable Performance In IaaS Clouds:Per-Formance Optimization And Scheduling Of Virtual Ma-Chine Workloads
3	Research On Performance Guarantee Of Distributed Dnn Training With Serverless Architectures
4	Optimizing cloud-service performance: efficient resource provisioning via optimal workload allocation
5	Research On Resource Provisioning Mechanisms For Cloud Services Based On Service Selection
6	Research On Approach Of Measurement And Prediction Of Performance Interference Between Virtual Machines And Its Application
7	Research On Performance Optimization Of Large Scale Elastic Resource In IaaS Cloud Computing
8	Feedback-Controlled, Virtualized Resource Sharing for Predictable E-Science
9	Scheduling Methods For Cloud Workflows With Various Resource Provisioning Manners
10	The Research Of A Multi-Objectives Dynamic Hybrid Cloud Resource Provisioning Mechanism