Design And Implementation Of High Available Virtualized GPU Resources

Posted on:2017-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:X H Xu

Full Text:PDF

GTID:2428330590488881

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Graphic Process Unit(GPU)has cemented its position in modern computer systems.The application scenarios of GPU range from graphics computing,media transcoding to high performance computing.People also saw the enormous potentials for GPU to do general-purpose computing on graphics processing units(GPGPU)thanks to its parallel nature.Therefore,the cloud environment starts to import GPU as a key computing resource to provide a hybrid computing service.To this end,two full GPU virtualization solutions,gVirt and GPUvm,have been proposed recently.The current full GPU virtualization solutions,however,are quite unfledged.For example,they lack some crucial functionalities of Virtual Machine(VM)management,such as checkpointing and migration.Additionally,GPU may crash or hang due to various reasons.The current solution is to reset the GPU hardware through the mechanism provided by modern GPU vendors.When the GPU driver detects timeout,the operating system usually can recover by resetting the GPU hardware in driver while the application may end up with an unpredictable result.This approach sacrifices the execution of applications for the stability of operating system.A typical cloud environment leverages virtualization technique to consolidate multiple VMs on one physical host.However,virtualization has been always a doubled-edged sword.The benefits of consolidation come with the price of increased possibility of GPU failure.As a result,the demand for High Availability(HA)in virtualization is emphasized.In this paper,we pioneer a fast and iterative checkpointing mechanism for VM with full GPU virtualization based on gVirt.The challenges of this paper are how to define the context of a virtual GPU and how to reduce the downtime.We are the first to propose command auditing to solve the problem that GPU lacks of dirty bit mechanism.Further,we propose HAG,an open source solution,leveraging our checkpointing mechanism to back up the whole VM to another host.Hence,the backup VM can take over when the driver detects GPU hangs,which eventually guarantees the high availability of virtualized GPU resources.Since the continuous checkpointing via HAG incurs overhead,the downtime and performance degradation are analyzed.Our evaluation shows that 1)the downtime of the VM migration or backup is only 224-411 ms,and 2)different GPU workloads achieve 77%-92%of performance with the backup interval of several seconds and our solution only occupies 80-170 Mbits/Sec bandwidth during execution.

Keywords/Search Tags:

virtualization, GPU, checkpointing, migration, high availability

PDF Full Text Request

Related items

1	Design And Implementation Of High Availability Virtualization Management Center
2	Study And Design Of Optimization Strategy Of High Availability In Virtualization
3	The Study Of Live Migration For The Full GPU Virtualization
4	Research And Development Of High-availability Virtualization Management Operating Environment
5	Research On Application-Oriented Lightweight Virtualization High Availability Technology
6	Research And Implementation Of Virtualization Management Framework For High Availability
7	Research And Implementation Of High-availability MPI Parallel Programming Environment And Parallel Programming Methods
8	High Availability Based On Virtualization Technology Campus Network Routing Solution And Implementation
9	Research On Virtual Machine Based Fast Fault Recovery Technology
10	Investigation And Implementation Of High Availability For Monitoring And Configuring Server In CSCloud