Hardware solutions to reduce effective memory access time

Posted on:2002-07-16

Degree:Ph.D

Type:Dissertation

University:University of Michigan

Candidate:Srinivasan, Vijayalakshmi

Full Text:PDF

GTID:1468390011497778

Subject:Engineering

Abstract/Summary:

In this dissertation, we provide hardware solutions to increase the efficiency of the cache hierarchy, thereby reducing the effective memory access time. Specifically, we focus on two approaches to reduce the effective memory access time, namely prefetching and novel cache designs.; In the first part of this dissertation we show that traditional metrics like coverage and accuracy may be inadequate for evaluating the effectiveness of a prefetch algorithm. Our main contribution is the development of a prefetch traffic and miss taxonomy (PTMT) that provides a complete classification of all the prefetches; in particular, the PTMT classification precisely quantifies the direct and indirect effect of each prefetch on traffic and misses.; We show that while most instruction prefetch algorithms do achieve a substantial reduction in misses, they fail to issue the prefetches in a timely fashion. Our branch history guided hardware prefetch algorithm (BHGP) improves the timeliness of instruction prefetches. Our results show that BHGP on average eliminates 66% of the I-cache misses for some important commercial and Windows-NT applications and some applications from the CPU2000 suite that have high I-cache misses. In addition, BHGP improves IPC by 14 to 18% for the CPU2000 applications studied.; In the second part of this dissertation, we explore novel cache designs to reduce the effective L1 cache access time in the light of current technology trends. We show that the straightforward approach of adding one more level of memory hierarchy, an L0 cache between the processor and the L1 cache, does not always reduce the effective cache access time because of the high miss rate of the L0 cache and the small difference in access latency between L0 and L1. We develop a split latency cache system (SpliCS), which is an enhanced version of the traditional (L0 + L1) system and uses 2 primary caches: a small fast cache (A) and a larger, slower cache (B). Our experiments show that, relative to a similarly configured L1 cache alone, SpliCS achieves an 8% or 18% reduction in CPI with a cache B latency of 3 or 5 cycles, respectively. Moreover, SpliCS achieves an average 15% improvement in CPI relative to a traditional (L0 + L1) hierarchy. Abstract shortened by UMI.)...

Keywords/Search Tags:

Effective memory, Access time, Cache, Hardware, Reduce, Hierarchy

Related items

1	Research On Analysis And Optimization Of Data Access For Memory Hierarchy
2	Research On Memory Simulation And Optimizations In CMPs
3	Improving memory hierarchy performance with hardware prefetching and cache replacement
4	Research On The Design And Performance Optimization Of Memory System For Stream Architecture
5	An adaptive chip multiprocessor cache hierarchy
6	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
7	Study Of Hardware Adptive Prefetch Technoligy Based On Application Pragram Memory Access Pattern
8	Cooperative hardware/software caching for next-generation memory systems
9	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
10	Research On Shared Cache Access Fairness For Many-Core Processor