Improving memory hierarchy performance with hardware prefetching and cache replacement

Posted on:2003-03-09

Degree:Ph.D

Type:Dissertation

University:University of Michigan

Candidate:Lin, Wei-Fen

Full Text:PDF

GTID:1468390011985916

Subject:Computer Science

Abstract/Summary:

The growing performance gap caused by high processor clock rates and slow DRAM accesses makes cache misses increasingly expensive. In this dissertation, we show that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses on the SPEC CPU2000 benchmark suite. The interface between on-chip L2 caches and off-chip high-bandwidth DRAM devices is thus a primary target for performance improvement. This dissertation integrates effective solutions to mitigate L2 miss penalties to improve the overall performance.; First, we tackle the problem with attempts to reduce L2 miss latency. We propose and evaluate a prefetch architecture, integrated with the on-chip L2 cache and memory controllers, that aggressively prefetches large regions of data on demand misses. By scheduling these prefetches only during idle cycles on the memory channel, inserting them into the cache with low replacement priority, and prioritizing them to take advantage of the DRAM organization, we improve performance significantly on 10 of 26 SPEC benchmarks without negatively affecting the others. We also eliminate 70% of superfluous prefetches using density vectors to minimize redundant traffic and power consumption. An alternative approach is to reduce L2 miss rates. We propose a novel approach to approximate the decisions made by an optimal replacement algorithm (OPT) using last-touch prediction. Any replacement policy which improves upon LRU fits into our framework. The central idea is to identify, via prediction, the final reference to a cache block before the block would be evicted under OPT—the “OPT last touch”. We record OPT decisions at specific program points and replay these replacement decisions when the corresponding situation arises again. We also contrast two different methods, early eviction and late retention, to approach the problem systematically.

Keywords/Search Tags:

Cache, Performance, L2 miss, Memory, Replacement, DRAM

Related items

1	A Research On Cache Replacement Mechanism For Hybrid Memory Systems
2	Cache memory design with embedded LRU replacement policy
3	Modeling and design of high-performance and power-efficient 3D dram architectures
4	Design Of Multi-level Cache Replacement Strategy In Multi-Core Processors
5	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
6	Research On Analytical Modeling Of Memory Subsystem Performance
7	Research On High Performance Cache And Memory System
8	A Quantitative Analysis Of Memory Level Parallelism And Cache Prefetching On Multi-core Processors With Multi-level Caches
9	Modeling Memoey-level Parallelism Of Cache Analytically
10	Design And Verification Of Performance Analysis Model For Mobile Application Memory Access