Font Size: a A A

Hardware solutions to reduce effective memory access time

Posted on:2002-07-16Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Srinivasan, VijayalakshmiFull Text:PDF
GTID:1468390011497778Subject:Engineering
Abstract/Summary:
In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarchy, thereby reducing the effective memory access time. Specifically, we focus on two approaches to reduce the effective memory access time, namely prefetching and novel cache designs.; In the first part of this dissertation we show that traditional metrics like coverage and accuracy may be inadequate for evaluating the effectiveness of a prefetch algorithm. Our main contribution is the development of a prefetch traffic and miss taxonomy (PTMT) that provides a complete classification of all the prefetches; in particular, the PTMT classification precisely quantifies the direct and indirect effect of each prefetch on traffic and misses.; We show that while most instruction prefetch algorithms do achieve a substantial reduction in misses, they fail to issue the prefetches in a timely fashion. Our branch history guided hardware prefetch algorithm (BHGP) improves the timeliness of instruction prefetches. Our results show that BHGP on average eliminates 66% of the I-cache misses for some important commercial and Windows-NT applications and some applications from the CPU2000 suite that have high I-cache misses. In addition, BHGP improves IPC by 14 to 18% for the CPU2000 applications studied.; In the second part of this dissertation, we explore novel cache designs to reduce the effective L1 cache access time in the light of current technology trends. We show that the straightforward approach of adding one more level of memory hierarchy, an L0 cache between the processor and the L1 cache, does not always reduce the effective cache access time because of the high miss rate of the L0 cache and the small difference in access latency between L0 and L1. We develop a split latency cache system (SpliCS), which is an enhanced version of the traditional (L0 + L1) system and uses 2 primary caches: a small fast cache (A) and a larger, slower cache (B). Our experiments show that, relative to a similarly configured L1 cache alone, SpliCS achieves an 8% or 18% reduction in CPI with a cache B latency of 3 or 5 cycles, respectively. Moreover, SpliCS achieves an average 15% improvement in CPI relative to a traditional (L0 + L1) hierarchy. Abstract shortened by UMI.)...
Keywords/Search Tags:Effective memory, Access time, Cache, Hardware, Reduce, Hierarchy
Related items