Techniques utilizing memory reference characteristics for improved performance

Posted on:2003-04-30

Degree:Ph.D

Type:Dissertation

University:University of Washington

Candidate:Wong, Wayne Anthony

Full Text:PDF

GTID:1468390011988308

Subject:Computer Science

Abstract/Summary:

This dissertation explores three aspects of reducing the memory latency by exploiting characteristics in the second-level cache miss stream.; Accessing data from main memory is two orders of magnitude slower than from a register within the processor. Thus, reducing the main memory latency is paramount for continued overall processor performance improvement. The prevailing solution is to use a cache. Most of the cache research to date has concentrated on the either simple cache geometries, relatively small miss latencies, or used simple microarchitectures. With current trends in computer architecture, techniques demonstrated in the past may not be as effective.; In the first part of the dissertation, I explore a mechanism for reducing the number of cache misses. Recognizing that there is opportunity to improve upon the traditional least recently used (LRU) replacement algorithm, I describe a new cache replacement mechanism, Reference Locality Replacement (RLR). RLR enables deviation from the strict LRU replacement priorities by allowing older cache lines predicted with having temporal locality to remain in the cache. The ability of RLR to reduce cache misses is demonstrated with both novel software and hardware-directed replacement policies.; In the second part of the dissertation, I examine the capability of hardware prefetching techniques to hide the latency of cache misses. With an aggressive superscalar microarchitecture and contemporary main memory latencies, I demonstrate that prefetches need to be initiated more than one cache miss ahead in order to completely hide the memory latency. As a result, those prefetching strategies that only prefetch the next cache miss will not scale well as the memory gap continues to grow. I reconfirm the ability of stream buffers to prefetch effectively for scientific applications. In contrast, I show the inability of the Markov and linked data structure prefetchers to prefetch effectively in general.; In the third part of the dissertation, I describe methods for reducing the main memory latency by exploiting the structure of memory devices. The structure of memory devices offers non-uniform access latencies. Using the device's large row buffer as a single-entry cache, the latency of memory reads is reduced by exploiting locality at a larger granularity. Effectively managing this faster access mode is demonstrated with two dynamic memory controllers that recognize the temporal and spatial locality in the cache miss stream.

Keywords/Search Tags:

Memory, Cache, Stream, Techniques, Reducing, Locality, Dissertation

Related items

1	Key Techniques Research Of Memory In Homogeneous General Purpose Stream Processor
2	Model Driven Cache Management
3	An Efficient Cache System For Hybrid Memory
4	Research On The Design And Performance Optimization Of Memory System For Stream Architecture
5	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
6	Analytical and statistical modeling to evaluate effectiveness of stream restoration in reducing stream bank erosion
7	Locality Transformations and Prediction Techniques for Optimizing Multicore Memory Performance
8	A higher order theory of locality and its application in multicore cache management
9	Analysis And Research Of Accelerating System Based On Data Stream Cache Mechanism
10	Study And Implementation Of High Performance Parallel Hierarchy Stream Memory System