Font Size: a A A

Techniques utilizing memory reference characteristics for improved performance

Posted on:2003-04-30Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Wong, Wayne AnthonyFull Text:PDF
GTID:1468390011988308Subject:Computer Science
This dissertation explores three aspects of reducing the memory latency by exploiting characteristics in the second-level cache miss stream.; Accessing data from main memory is two orders of magnitude slower than from a register within the processor. Thus, reducing the main memory latency is paramount for continued overall processor performance improvement. The prevailing solution is to use a cache. Most of the cache research to date has concentrated on the either simple cache geometries, relatively small miss latencies, or used simple microarchitectures. With current trends in computer architecture, techniques demonstrated in the past may not be as effective.; In the first part of the dissertation, I explore a mechanism for reducing the number of cache misses. Recognizing that there is opportunity to improve upon the traditional least recently used (LRU) replacement algorithm, I describe a new cache replacement mechanism, Reference Locality Replacement (RLR). RLR enables deviation from the strict LRU replacement priorities by allowing older cache lines predicted with having temporal locality to remain in the cache. The ability of RLR to reduce cache misses is demonstrated with both novel software and hardware-directed replacement policies.; In the second part of the dissertation, I examine the capability of hardware prefetching techniques to hide the latency of cache misses. With an aggressive superscalar microarchitecture and contemporary main memory latencies, I demonstrate that prefetches need to be initiated more than one cache miss ahead in order to completely hide the memory latency. As a result, those prefetching strategies that only prefetch the next cache miss will not scale well as the memory gap continues to grow. I reconfirm the ability of stream buffers to prefetch effectively for scientific applications. In contrast, I show the inability of the Markov and linked data structure prefetchers to prefetch effectively in general.; In the third part of the dissertation, I describe methods for reducing the main memory latency by exploiting the structure of memory devices. The structure of memory devices offers non-uniform access latencies. Using the device's large row buffer as a single-entry cache, the latency of memory reads is reduced by exploiting locality at a larger granularity. Effectively managing this faster access mode is demonstrated with two dynamic memory controllers that recognize the temporal and spatial locality in the cache miss stream.
Keywords/Search Tags:Memory, Cache, Stream, Techniques, Reducing, Locality, Dissertation
Related items