Font Size: a A A

Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Posted on:2002-09-06Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Cintra, Marcelo HehlFull Text:PDF
GTID:2468390011497232Subject:Computer Science
Abstract/Summary:
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems.; In this thesis, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed of the node. We effectively utilize a speculative CMP as the building block for our scheme.; Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important nonanalyzable scientific loops, we report average speedups of 5.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.; With speculative parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation at run time, it squashes offending threads and reverts to a safe state. Squashing can cripple performance, especially in scalable multiprocessors and systems that do not support speculative state at the fine granularity of memory words.; In this thesis, we also propose a new approach to reduce the cost of handling cross-thread data dependence violations: run-time learning. Using a new module called the Violation Prediction Table, the hardware learns to stall a thread when it seems likely to trigger a squash, and to release it when it is unlikely to trigger one. Simulations of a 16-processor scalable system show that the scheme is very effective. For a protocol that keeps speculation state on a per-line basis at the system level, learning eliminates on average 84% of the squashes. The resulting system runs on average 43% faster, and its performance is very close to a system with perfect prediction.
Keywords/Search Tags:Speculative parallelization, Scalable, Hardware, System, Support
Related items