Font Size: a A A

The Architecture Designs And Scheduling Algorithms For Heterogeneous Datacenters

Posted on:2018-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z N WangFull Text:PDF
GTID:1368330590955274Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of cloud computing,the system architectures and scheduling algorithms in datacenters,which serve as the foundation of cloud computing,have become important research areas in computer science.Different from traditional high performance computing clusters,datacenters need to serve multiple users simultaneously,and users have their demands for quality of service(QoS)for latency-sensitive applications.Traditionally,datacenters are equipped with multi-core CPUs to host single-threaded application with stable workload.Many algorithms have been proposed to schedule these tasks.Recently,heterogeneous processors like General Purpose Graphic Processing Units(GPGPU)have been widely adopted in datacenters to improve their performance and energy efficiency,making them heterogeneous datacenters.However,there are four new problems in heterogeneous datacenters.First,current GPUs are designed for high performance computing clusters,and can only be used exclusively,which does not meet the demands of sharing in datacenters.It is important to design new mechanisms for GPUs to support highly efficient sharing,which is required by datacenters.Second,GPU does not have hardware support for QoS,leading to resource under-utilization and lower system performance in datacenters when executing latency-sensitive applications.At the same time,the usage of GPUs brings heterogeneous tasks which use GPUs or both CPUs and GPUs.These heterogeneous tasks also bring challenges to scheduling algorithms.Last,new applications are multi-threaded with changing workloads.Previous scheduling algorithms designed for traditional applications do not fit this new type of applications.Hence,heterogeneous datacenters need new architectures and scheduling algorithms to meet the needs of new applications.Targeting the heterogeneous processors and new applications in datacenters,we investigate the architectural designs and scheduling algorithms in heterogeneous datacenters.Our main research is in the following areas:1.Highly efficient multitasking GPU architecture: For efficiently sharing GPU,we propose a fine-grain sharing GPU architecture: Simultaneous Multikernel GPU(SMK).It consists of three components: Partial Context Switch with low overhead,Fair Static Resource Allocation algorithm,and Fair Warp Scheduling Algorithm.With these schemes,the GPU can dynamically allocates the resources for sharer kernels to execute them with high fairness and performance.Evaluation result shows that SMK improves the system throughput by 37% over non-shared execution and 12.7% over a state-ofthe-art design.2.QoS mechanism for multitasking GPUs: For the QoS support of GPU,we propose QoS mechanisms for fine-grain GPU sharing.Our QoS support mechanism can provide control over the progress of kernels on per cycle basis with the thread-level parallelism allocation and QoS-aware warp scheduling for each kernel.Evaluation result shows the our techniques achieves QoS goals 43.8% more often,and a throughput 20.5% higher than previous techniques.3.Scheduling algorithm for heterogeneous tasks which use both CPU and GPU: To balance the workload among CPUs and GPUs,we propose Co-Scheduling Based on Asymptotic Profiling(CAP)to distribute the workload between CPUs and GPUs with optimizations for the performance characteristics of GPU.CAP accurately predicts the performance ratio between CPUs and GPUs at runtime with low performance overhead,and dynamically distributes the workload based on the performance ratio.Evaluation result shows that CAP produces up to 42.7% performance improvement on average compared with the state-of-the-art algorithms.4.Scheduling algorithm for multi-thread applications with different workloads: For QoSaware scheduling of multi-thread applications with different workloads,we propose EMU,a QoS-Aware Elastic Contention Management mechanism.EMU uses machine learning algorithms to predict the latency of each user query with different resource configurations,and allocates just enough resources to the application to meet its QoS target.Evaluation result shows that EMU improves the throughput of the co-located applications by 36.02% on average compared with the state-of-the-art technique while achieving the 99%-ile latency target for latency-sensitive applications.
Keywords/Search Tags:Datacenters, GPGPU, Fine-grain Sharing, Quality of Service, Task Scheduling
PDF Full Text Request
Related items