Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Posted on:2002-09-06

Degree:Ph.D

Type:Thesis

University:University of Illinois at Urbana-Champaign

Candidate:Cintra, Marcelo Hehl

Full Text:PDF

GTID:2468390011497232

Subject:Computer Science

Abstract/Summary:

Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable shared-memory systems.; In this thesis, we present and evaluate a new hardware scheme for scalable speculative parallelization. This design needs relatively simple hardware and is efficiently integrated into a cache-coherent NUMA system. We have designed of the node. We effectively utilize a speculative CMP as the building block for our scheme.; Simulations show that the architecture proposed delivers good speedups at a modest hardware cost. For a set of important nonanalyzable scientific loops, we report average speedups of 5.2 for 16 processors. We show that support for per-word speculative state is required by our applications, or else the performance suffers greatly.; With speculative parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation at run time, it squashes offending threads and reverts to a safe state. Squashing can cripple performance, especially in scalable multiprocessors and systems that do not support speculative state at the fine granularity of memory words.; In this thesis, we also propose a new approach to reduce the cost of handling cross-thread data dependence violations: run-time learning. Using a new module called the Violation Prediction Table, the hardware learns to stall a thread when it seems likely to trigger a squash, and to release it when it is unlikely to trigger one. Simulations of a 16-processor scalable system show that the scheme is very effective. For a protocol that keeps speculation state on a per-line basis at the system level, learning eliminates on average 84% of the squashes. The resulting system runs on average 43% faster, and its performance is very close to a system with perfect prediction.

Keywords/Search Tags:

Speculative parallelization, Scalable, Hardware, System, Support

Related items

1	Design And Performance Evaluation Of Scalable Transactional Memory Architecture Supporting Speculative Parallelization
2	Research Of Software Speculative Parallelization Based On Transactions
3	Speculative parallelization on multicore processors
4	Scalable real-time architectures and hardware support for high-speed QoS packet schedulers
5	Research On Multi-core/Many-core Platform Oriented Speculative Parallelizing Technology
6	Research On Heterogeneous Multicore Platform Oriented Efficient Speculative Parallelizing Technology
7	Design And Implementation Of Automatic Code Generator For TLS Systems
8	Hydra: A chip multiprocessor with support for speculative thread-level parallelization
9	Applying speculative parallelization to the protocol design for networks with lossy links
10	The Research Of Parallelization Technique Based On Shared Memory Structure And Optimization