Font Size: a A A

Exploiting multi-grained parallelism for multiple-instruction-stream architectures

Posted on:1998-06-04Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Newburn, Christopher JohnFull Text:PDF
GTID:2468390014478825Subject:Engineering
Abstract/Summary:
Exploiting parallelism is an essential part of maximizing the performance of an application on a parallel computer. Parallelism is traditionally exploited at two granularities: individual operations are executed in parallel within a processor to exploit instruction-level parallelism and loop iterations or processes are executed in parallel on different processors to exploit loop-level parallelism and process-level parallelism.; A new generation of architectures that execute multiple instruction streams on a single chip has the potential of significantly reducing the gap between communication costs within a processor and between processors. This means that parallelism of multiple granularities can be exploited between instruction streams by overlapping regions of code that range in granularity from a small set of instructions to basic blocks, conditionals, loop iterations, loop nests, procedure calls, and collections of such constructs. This opens the way to exploiting more parallelism in a larger number of applications than has been feasible in the past. Furthermore, it creates a demand for compilation techniques which exploit multi-grained parallelism, that is, the overlap of program regions of different granularities.; This thesis studies the exploitation of multi-grained parallelism. It presents a program representation called the program dependence graph (PDG) and a node labeling scheme that supplements it. These representations have been specialized to expose multi-grained parallelism and facilitate its exploitation on a multiple-instruction-stream architecture. The thesis investigates novel compilation techniques for exploiting multi-grained parallelism and explores the impact of synchronization cost on performance. These techniques perform partitioning, scheduling and synchronization of a single application for a multiple-instruction-stream architecture. The partitioning techniques make global trade-offs to select the granularity of parallelism to exploit in each part of the program so as to minimize the overall latency for a target architecture. The thesis describes an implementation of these representations and techniques called Pedigree, which is the first post-pass, retargetable compiler to target multiple-instruction-stream architectures. The SDIO and some SPEC benchmarks have been compiled by Pedigree and used to demonstrate its ability to parallelize code. The best results for exploiting multi-grained parallelism come from overlapping parallelized loop nests, something which is new to this work.
Keywords/Search Tags:Parallelism, Multiple-instruction-stream, Architecture, Loop
Related items