Optimizing Throughput and Power Consumption of Graphics Processing Units (GPUs)

Posted on:2014-03-02

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Lee, Jung Seob

Full Text:PDF

GTID:1458390005985861

Subject:Engineering

Abstract/Summary:

Although they were originally developed for processing computer graphics, modern GPUs are able to execute general-purpose applications requiring high computational throughput ability. The major improvement in GPU's throughput has been achieved by integrating more cores and operating them at higher frequency with higher on-chip interconnects and off-chip memory bandwidth. However, GPUs also consume a substantial amount of power due to many fast cores, limiting further throughput improvement under a given power constraint. Furthermore, GPUs began to be used for mobile computing devices operating under stringent power and energy constraints. Therefore, it is critical to make GPUs power-efficient. In this dissertation, I propose novel techniques that can maximize the throughput or minimize power consumption of GPUs under a given power or throughput constraint. The techniques are motivated by the fact that GPGPU applications exhibit the maximum throughput or minimum power consumption depending on hardware configuration (i.e., the number of running cores) and operating conditions (i.e., voltage and frequency). The proposed approaches use adaptive runtime algorithms that can determine the optimal hardware configuration and operating conditions to either maximize throughput or minimize power consumption for a given application. As technology is scaled down, increasing within-die (WID) process variations and decreasing physical size of individual cores lead to notable frequency and leakage power variations among cores in a die. Such core-to-core (C2C) frequency and power variations can significantly affect the maximum operating frequency (Fmax) of many-core processors like GPUs. The slowest core in a die often limits the Fmax of a GPU while the remaining faster cores consume more leakage power because the slow and fast cores have very different transistor characteristics. In this dissertation, I improve throughput of GPUs by exploiting WID C2C frequency and power variations. GPGPU applications have very rare synchronizations among their cores, enabling a GPU to operate its cores at their own Fmax with little synchronization overhead. The proposed approach is to allow independent clock frequencies among cores using per-core phase-locked loop (PLL) circuit to maximize the throughput. In addition, I observe that problem-size and/or memory-bound applications do not benefit from many cores. Thus, I improve the throughput of such applications by disabling the slow cores that limit the Fmax of a GPU. Finally, I further improve throughput by incorporating existing spatial multitasking techniques with per-core frequency assignment. This technique uses application characteristics to determine core assignments taking advantage of WID variations.

Keywords/Search Tags:

Gpus, Throughput, Power, GPU, Frequency, Cores, WID, Applications

Related items

1	Efficient throughput cores for asymmetric manycore processors
2	Exploiting Parallelism in GPUs
3	Parallel subgraph mining on hybrid platforms: HPC systems, multi-cores and GPUs
4	OS and Runtime Support for Efficiently Managing Cores in Parallel Applications
5	Study on the trade off between throughput and power consumption in the design of Bluetooth Low Energy applications
6	Automatic transformation and optimization of applications on GPUs and GPU clusters
7	Technology impacts of CMOS scaling on microprocessor core design for hard-fault tolerance in single-core applications and optimized throughput in throughput-oriented chip multiprocessors
8	Research On Surface Modification Of Fe-Si-Cr Magnetic Powder Cores
9	Architectural Applications of Radio Frequency Interconnect for Chip-to-DRAM Communication
10	Micro-architectural support for improving synchronization and efficiency of simd execution on gpus