Font Size: a A A

Cache Management For Many-core Accelerators

Posted on:2015-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H ChenFull Text:PDF
GTID:1108330509960962Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Moore’s law drives the semiconductor industry to the direction of developing energyefficient heterogeneous chips or systems. Manycore accelerators, such as GPUs, have been widely used and integrated into general-purpose microprocessors. With the SIMT execution model, GPUs can hide memory latency through massive multi-threading for many regular applications. To support applications with irregular memory access patterns, cache hierarchies have been introduced to GPU architectures to capture locality and mitigate the effect of irregular accesses. However, GPU caches exhibits poor efficiency due to the mismatch of the throughput-oriented execution model and its cache hierarchy design, which limits system performance and energy-efficiency.Existing CPU cache management policies that are designed for multi-core systems can be suboptimal when directly applied to GPU caches. This is mainly because CPU cache management schemes are not able to control wroking set and resource utilization.Moreover, when massive parallelism is limited by the on-chip resources, execution units are waiting for the data to return back from memory subsystem, in which case the system energy efficiency is highly hampered. To reduce memory access latency and bandwidth requirements, programmers often need to do complicated and tedious optimization for GPUs, which largely increases programmer’s burden.Therefore, we propose specialized cache management scheme for GPGPUs. The cache replacement policy is aware of warp scheduling and streaming access pattern, and can significantly reduce cache pollution and contention. Reuse distance-based cache bypass policy protects cache hierarchy to mitigate cache contention. Dynamic monitoring mechanism captures cache contention and resource congestion information by the counter sampling at run time. To avoid over-saturating on-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. Innovation and contribution of this thesis include:1. We propose adaptive cache replacement and bypass policy for the memory access behavior under the massively parallel execution model of GPUs. We carry out a detailed simulation and analysis on the memory access behavior of GPU applications,and the experimental results show GPU cache inefficiency and its root causes. To deal with severe cache pollution and contention, we first port state-of-the-art CPU cache management schemes to the GPU, which demonstrates that advanced management schemes can improve GPU cache efficiency. It is also illustrated that pure bypass policies have limitations. We then combine the most advanced anti-pollution and anti-thrashing cache management schemes, and propose an adaptive scheme for GPU streaming pattern and severe cache contention. Experimental results show that this scheme can achieve further performance improvements.2. Given the limitations of pure bypassing, we propose an specialized cache management policy for GPGPUs. The cache hierarchy is protected from contention by the bypass policy based on reuse distance. Contention and resource congestion are detected at runtime. To avoid over-saturating on-chip resources, the bypass policy is coordinated with warp throttling to dynamically control the active number of warps. We also propose a simple predictor to dynamically estimate the optimal number of active warps that can take full advantage of the cache capacity and on-chip resources. Experimental results show that cache efficiency is significantly improved and on-chip resources are better utilized for cache-sensitive benchmarks.This results in a harmonic mean IPC improvement of 74% and 17%(maximum661% and 44% IPC improvement), compared to the baseline GPU architecture and optimal static warp throttling, respectively.3. We present cache-aware power management for GPGPUs. Ideally the peak performance of GPUs is proportional to the number of execution units and their operating frequencies, but practically the system resource requirements vary largely for different applications, for example, memory intensive applications are likely to be limited by the memory subsystem, thus unable to reach its peak throughput. Targeting memory intensive applications that limited by the memory subsystem, we consider to save system power and improve energy efficiency without sacrificing performance. Based on monitoring the utilization of system resources(caches and No C/DRAM bandwidth), we employ core scaling mechanism to control the number of active SIMT cores, and DVFS technology to scale the operating voltage/frequency, and thus to reduce power consumption and improve energy efficiency.
Keywords/Search Tags:GPGPU, cache management, warp throttling, power control
PDF Full Text Request
Related items