Decoupled memory access architectures with speculative pre-execution

Posted on:2005-04-19

Degree:Ph.D

Type:Dissertation

University:University of Southern California

Candidate:Ro, Wonwoo

Full Text:PDF

GTID:1458390008478002

Subject:Engineering

Abstract/Summary:

The speed gap between processor and main memory is the major performance bottleneck of modern computer systems. As a result, today's microprocessors suffer from frequent cache misses and lose many CPU cycles due to pipeline stalling. Although traditional prefetching methods considerably reduce the number of cache misses, most of them strongly depend on the predictability of future accesses and hence fail when faced with irregular memory accesses without much locality.; This dissertation proposes an alternative prefetching method which is based on the access/execute decoupling paradigm. Access/execute decoupling does not depend on predictability, but rather decouples and executes the future access-related instructions to prefetch data. This dissertation introduces two microarchitecture models based on the access/execute decoupled architectures. Both architectures can provide an effective latency hiding mechanism.; The HiDISC (Hierarchical Decoupled Instruction Stream Computer) is an effective data prefetching method, even without sufficient access regularity. Two processors are designed for each level of the memory hierarchy. They act in concert to mask the long access latencies. Instead of guessing the future memory accesses, the actual access-related instructions are separated from the main computation and independently executed on the two additional processors.; The SPEAR (Speculative Pre-Execution Assisted by compileR) is a further variation of the HiDISC architecture. It is developed to use the benefit of prefetching capability of the HiDISC on the current multithreaded processors such as Simultaneous Multithreading or Chip Multiprocessor. The SPEAR approach decouples only the future probable cache miss slice from the original code and executes it as an additional helping thread in a multithreaded hardware. The cache miss thread is lightweight and can run faster than the main program thread. Therefore, as long as it is executed early enough, timely prefetching can be achieved.; This dissertation contains the detailed hardware design issues and compiler support for the proposed two models. In addition to that, detailed benchmark simulation and performance analysis are presented. The advantages of the proposed two architectures are compared to the aggressive wide-issue superscalar architecture. Both architectures successfully achieve better performance than the baseline architecture, reducing either the cache miss penalty or the cache miss rate.

Keywords/Search Tags:

Architectures, Memory, Cache miss, Performance, Access, Decoupled

Related items

1	Design And Verification Of Performance Analysis Model For Mobile Application Memory Access
2	The Design And Implementation Of High Performance Level Two Cache Controller On DSP Chip
3	Research On Management Policy Of Shared Last Level Cache For Chip Multiprocessors
4	Improving memory hierarchy performance with hardware prefetching and cache replacement
5	Research On Analytical Modeling Of Memory Subsystem Performance
6	Research On High Performance Cache And Memory System
7	A Quantitative Analysis Of Memory Level Parallelism And Cache Prefetching On Multi-core Processors With Multi-level Caches
8	Modeling Memoey-level Parallelism Of Cache Analytically
9	Compiler techniques for evaluating and extending decoupled architectures (Data prefetching)
10	Memory Access Optimization Of ATLAS On Loongson 2F