Prefetch mechanisms that acquire and exploit application specific knowledge

Posted on:2002-10-23

Degree:Ph.D

Type:Dissertation

University:University of Michigan

Candidate:Annavaram, Murali Mohan Kumar

Full Text:PDF

GTID:1468390011497529

Subject:Computer Science

Abstract/Summary:

The large number of cache misses of current applications coupled with the increasing cache miss latencies in current processor designs cause significant performance degradation, even in aggressive out-of-order processors. This dissertation explores two novel prefetching techniques to reduce I-cache and D-cache misses by acquiring and exploiting application specific knowledge.; The first part of this dissertation focuses on reducing I-cache misses of database systems using Call Graph Prefetching (CGP). CGP can be implemented either in software or in hardware. Both implementations are based on the insight that the sequence of function calls in a DBMS is highly predictable. CGP leverages this predictability by analyzing the call graph of a DBMS to prefetch a function that is likely to be called next. We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries running on today's typical out-of-order superscalar processor models and show that CGP reduces the I-cache miss stall time by nearly 50%.; The second part of this dissertation addresses data prefetching. Initially we developed Ancestor Driven Prefetching (ADP). ADP builds ancestor graphs for frequently executed load/store instructions; it exploits the correlation between values defined by an ancestor of a group of memory instructions and the subsequent addresses they access to issue prefetches for the group. ADP suffered both accuracy and timeliness problems that limited its success. But the insights gained from ADP led us to precompute prefetch addresses instead of predicting them.; Dependence Graph Precomputation (DGP) uses a novel approach to data prefetching. Once an instruction is fetched from the I-cache into the Instruction Fetch Queue (IFQ), its dependences are determined and stored as pointers with the instruction in the IFQ. When a load/store instruction that is deemed likely to cause a cache miss enters the IFQ, a Dependence Graph Generator follows the dependence pointers still within the IFQ to generate the dependence graph of those instructions yet to be executed that will determine the address of the load/store instruction. A separate precomputation engine executes these graphs to generate the load/store addresses early enough for accurate prefetching. Our results show DGP reduces the D-cache miss stall time by 47%. Thus the techniques presented in this dissertation, CGP and DGP, take us about half way from an already highly tuned baseline system toward performance with perfect I-cache and perfect D-cache, respectively.

Keywords/Search Tags:

Cache, CGP, Prefetch, Miss, IFQ, ADP

Related items

1	The Design And Implementation Of High Performance Secondary Cache On YHFT-DX DSP
2	Design And Implementation Of A Web Cache And Prefecthing System
3	Research On High Performance Cache And Memory System
4	An Online Cache Miss Rate Curves Generating And Shared Cache Partition Method
5	Research On The Problem Of First Miss In Edge Cache
6	The Design And Implementation Of Non-blocking And Miss-pipeline Global Cache On XDSP Chip
7	Design And Implementation Of Level One Cache Miss Pipelining On High Performance DSP
8	Classification-based Prefetch-Aware Cache Partition Mechanism
9	Application Research Of Data Cache Technology In MIS
10	Design And Implementation Of Data Prefetcher