Font Size: a A A

Improving memory hierarchy performance with hardware prefetching and cache replacement

Posted on:2003-03-09Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Lin, Wei-FenFull Text:PDF
GTID:1468390011985916Subject:Computer Science
Abstract/Summary:
The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache misses increasingly expensive. In this dissertation, we show that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses on the SPEC CPU2000 benchmark suite. The interface between on-chip L2 caches and off-chip high-bandwidth DRAM devices is thus a primary target for performance improvement. This dissertation integrates effective solutions to mitigate L2 miss penalties to improve the overall performance.; First, we tackle the problem with attempts to reduce L2 miss latency. We propose and evaluate a prefetch architecture, integrated with the on-chip L2 cache and memory controllers, that aggressively prefetches large regions of data on demand misses. By scheduling these prefetches only during idle cycles on the memory channel, inserting them into the cache with low replacement priority, and prioritizing them to take advantage of the DRAM organization, we improve performance significantly on 10 of 26 SPEC benchmarks without negatively affecting the others. We also eliminate 70% of superfluous prefetches using density vectors to minimize redundant traffic and power consumption. An alternative approach is to reduce L2 miss rates. We propose a novel approach to approximate the decisions made by an optimal replacement algorithm (OPT) using last-touch prediction. Any replacement policy which improves upon LRU fits into our framework. The central idea is to identify, via prediction, the final reference to a cache block before the block would be evicted under OPT—the “OPT last touch”. We record OPT decisions at specific program points and replay these replacement decisions when the corresponding situation arises again. We also contrast two different methods, early eviction and late retention, to approach the problem systematically.
Keywords/Search Tags:Cache, Performance, L2 miss, Memory, Replacement, DRAM
Related items