Hydra: A chip multiprocessor with support for speculative thread-level parallelization

Posted on:2003-11-19

Degree:Ph.D

Type:Thesis

University:Stanford University

Candidate:Hammond, Lance Stirling

Full Text:PDF

GTID:2468390011486887

Subject:Engineering

Abstract/Summary:

This thesis describes the design and provides a detailed analysis of Hydra, a chip multiprocessor (CMP) made up of four normal MIPS cores, each with their own primary instruction and data caches. The cores are connected to each other, a shared on-chip secondary cache, and a high-speed off-chip DRAM interface by a pair of wide, pipelined buses that are specialized to support reads and writes, with relatively simple cache coherency protocols. The basic design supports interprocessor communication latencies on the order of 10 cycles, using the shared secondary cache, allowing a much wider variety of programs to be parallelized than is possible with a conventional, multichip multiprocessor. Our simulations show that such a design allows excellent speedup on most matrix-intensive floating point and multiprogrammed applications, but achieves speedup only comparable to a superscalar processor of similar area, at best, on the large family of integer applications that can really take advantage of the low communication latencies provided.; In order to make execution of integer programs easier on Hydra, we examined the possibility of adding thread-level speculation (TLS) support to Hydra. This is a mechanism in which processors are enhanced so that they can attempt to execute threads from a sequential program in parallel without knowing in advance whether the threads are parallel or not. The speculation hardware then monitors data produced and consumed by the different threads to ensure that no thread attempts to use data too early, before it is actually produced. If such an attempt is made, the offending thread is restarted. In this manner, threads may be generated from existing program constructs such as loops or subroutines almost automatically. Such support can be added to Hydra simply, with a few extra bits attached to the primary caches and some speculation buffers attached to the shared secondary cache. In practice, we found that most of our integer applications could be sped up to a level comparable to or better than an equal-area superscalar processor or our hand-parallelized benchmarks—and with very little programmer effort. However, we usually had to apply several manual optimization techniques to the code to achieve this speedup.

Keywords/Search Tags:

Hydra, Multiprocessor, Support

Related items

1	Atlas: A dynamically parallelizing chip-multiprocessor for gigascale integration
2	A general approach to multiprocessor scheduling
3	Design And Application Of Multiprocessor Based On NiosⅡ
4	Design Of Multiprocessor Parallel System Based On Vxibus
5	A Multiprocessor Simulator With Power Analysis Based On Simplescalar
6	Analysis Of Cache Misses In Chip Multiprocessor Using Simics
7	Architectural support for high-performance, power-efficient and secure multiprocessor systems
8	The case for a single-chip multiprocessor
9	Scheduling in DSP multiprocessor systems
10	Bus encryption and authentication unit for symmetric shared memory multiprocessor system using GCM-AES