Font Size: a A A

Optimizing Throughput and Power Consumption of Graphics Processing Units (GPUs)

Posted on:2014-03-02Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Lee, Jung SeobFull Text:PDF
GTID:1458390005985861Subject:Engineering
Abstract/Summary:
Although they were originally developed for processing computer graphics, modern GPUs are able to execute general-purpose applications requiring high computational throughput ability. The major improvement in GPU's throughput has been achieved by integrating more cores and operating them at higher frequency with higher on-chip interconnects and off-chip memory bandwidth. However, GPUs also consume a substantial amount of power due to many fast cores, limiting further throughput improvement under a given power constraint. Furthermore, GPUs began to be used for mobile computing devices operating under stringent power and energy constraints. Therefore, it is critical to make GPUs power-efficient. In this dissertation, I propose novel techniques that can maximize the throughput or minimize power consumption of GPUs under a given power or throughput constraint. The techniques are motivated by the fact that GPGPU applications exhibit the maximum throughput or minimum power consumption depending on hardware configuration (i.e., the number of running cores) and operating conditions (i.e., voltage and frequency). The proposed approaches use adaptive runtime algorithms that can determine the optimal hardware configuration and operating conditions to either maximize throughput or minimize power consumption for a given application. As technology is scaled down, increasing within-die (WID) process variations and decreasing physical size of individual cores lead to notable frequency and leakage power variations among cores in a die. Such core-to-core (C2C) frequency and power variations can significantly affect the maximum operating frequency (Fmax) of many-core processors like GPUs. The slowest core in a die often limits the Fmax of a GPU while the remaining faster cores consume more leakage power because the slow and fast cores have very different transistor characteristics. In this dissertation, I improve throughput of GPUs by exploiting WID C2C frequency and power variations. GPGPU applications have very rare synchronizations among their cores, enabling a GPU to operate its cores at their own Fmax with little synchronization overhead. The proposed approach is to allow independent clock frequencies among cores using per-core phase-locked loop (PLL) circuit to maximize the throughput. In addition, I observe that problem-size and/or memory-bound applications do not benefit from many cores. Thus, I improve the throughput of such applications by disabling the slow cores that limit the Fmax of a GPU. Finally, I further improve throughput by incorporating existing spatial multitasking techniques with per-core frequency assignment. This technique uses application characteristics to determine core assignments taking advantage of WID variations.
Keywords/Search Tags:Gpus, Throughput, Power, GPU, Frequency, Cores, WID, Applications
Related items