GPUs have been widely used in modern datacenters to accelerate emerging services such as Graph Processing,Intelligent Personal Assistant(IPA),and Deep Learning.However,current GPUs have very limited support for sharing.They are shared in a time-multiplexed manner in datacenters,which leads to low throughput.Previous studies on GPU kernel scheduling either target for fairness or only share GPUs statically,which cannot handle dynamically arriving kernels.Recent work has proposed hardware preemption mechanism for GPUs,enabling dynamic sharing.Exploiting this mechanism,we propose a preemption-aware kernel scheduling strategy for GPUs.Our strategy improves the throughput by running complementary kernels together.Furthermore,our strategy decides whether to preempt running kernels by weighing the performance benefit and overhead of the preemption with analytic models when new kernels arrive.Evaluation results show that our strategy improves the throughput by 20.1over sequential execution,and 11.5% over a FCFS strategy. |