Font Size: a A A

An adaptive chip multiprocessor cache hierarchy

Posted on:2008-03-07Degree:Ph.DType:Thesis
University:University of Colorado at BoulderCandidate:Settle, M. W. AlexanderFull Text:PDF
GTID:2448390005969989Subject:Engineering
Abstract/Summary:
System performance is increasingly coupled to cache hierarchy design as chip-multiprocessors (CMP) increase in core count across generations. Higher core counts require large last level cache (LLC) capacities to avoid costly off-chip memory band-width and the inherent bottleneck of memory requests from multiple active cores. Currently there are two divisions of thought in CMP cache design---shared versus private last level caches. At center of the issue is that CMP systems can improve different workloads: the throughput of multiple independent single-threaded applications and the high-performance demands of parallel multi-threaded applications. Consequently, maximizing the improvement of CMP performance in each of these domains requires opposing design concepts. As a result, it is necessary to investigate the behaviors of both shared and private LLC design models, as well as investigate an adaptive LLC approach that works for multiple workloads.; This thesis proposes a scalable CMP cache hierarchy design that shields applications from inter-process interference while offering a generous per core last level cache capacity and low round trip memory latencies. A combination of parallel and serial data mining applications from the Nu-Mine Bench suite along with scientific workloads from the SpecOMP suite are used to evaluate the cache hierarchy performance. The results show that the scalable CMP cache hierarchy decreases the average memory latency of the parallel workloads by 45% against the private cache configuration and an average of 15% against the shared cache. In addition, the memory bandwidth is 25% lower than the private cache bandwidth for parallel applications and 30% lower for the serial workloads.
Keywords/Search Tags:Cache hierarchy, Private cache, Workloads, Applications, Last level cache, Parallel
Related items