Font Size: a A A

Decoupled memory access architectures with speculative pre-execution

Posted on:2005-04-19Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Ro, WonwooFull Text:PDF
GTID:1458390008478002Subject:Engineering
Abstract/Summary:
The speed gap between processor and main memory is the major performance bottleneck of modern computer systems. As a result, today's microprocessors suffer from frequent cache misses and lose many CPU cycles due to pipeline stalling. Although traditional prefetching methods considerably reduce the number of cache misses, most of them strongly depend on the predictability of future accesses and hence fail when faced with irregular memory accesses without much locality.; This dissertation proposes an alternative prefetching method which is based on the access/execute decoupling paradigm. Access/execute decoupling does not depend on predictability, but rather decouples and executes the future access-related instructions to prefetch data. This dissertation introduces two microarchitecture models based on the access/execute decoupled architectures. Both architectures can provide an effective latency hiding mechanism.; The HiDISC (Hierarchical Decoupled Instruction Stream Computer) is an effective data prefetching method, even without sufficient access regularity. Two processors are designed for each level of the memory hierarchy. They act in concert to mask the long access latencies. Instead of guessing the future memory accesses, the actual access-related instructions are separated from the main computation and independently executed on the two additional processors.; The SPEAR (Speculative Pre-Execution Assisted by compileR) is a further variation of the HiDISC architecture. It is developed to use the benefit of prefetching capability of the HiDISC on the current multithreaded processors such as Simultaneous Multithreading or Chip Multiprocessor. The SPEAR approach decouples only the future probable cache miss slice from the original code and executes it as an additional helping thread in a multithreaded hardware. The cache miss thread is lightweight and can run faster than the main program thread. Therefore, as long as it is executed early enough, timely prefetching can be achieved.; This dissertation contains the detailed hardware design issues and compiler support for the proposed two models. In addition to that, detailed benchmark simulation and performance analysis are presented. The advantages of the proposed two architectures are compared to the aggressive wide-issue superscalar architecture. Both architectures successfully achieve better performance than the baseline architecture, reducing either the cache miss penalty or the cache miss rate.
Keywords/Search Tags:Architectures, Memory, Cache miss, Performance, Access, Decoupled
Related items