Exploiting multi-grained parallelism for multiple-instruction-stream architectures

Posted on:1998-06-04

Degree:Ph.D

Type:Thesis

University:Carnegie Mellon University

Candidate:Newburn, Christopher John

Full Text:PDF

GTID:2468390014478825

Subject:Engineering

Abstract/Summary:

Exploiting parallelism is an essential part of maximizing the performance of an application on a parallel computer. Parallelism is traditionally exploited at two granularities: individual operations are executed in parallel within a processor to exploit instruction-level parallelism and loop iterations or processes are executed in parallel on different processors to exploit loop-level parallelism and process-level parallelism.; A new generation of architectures that execute multiple instruction streams on a single chip has the potential of significantly reducing the gap between communication costs within a processor and between processors. This means that parallelism of multiple granularities can be exploited between instruction streams by overlapping regions of code that range in granularity from a small set of instructions to basic blocks, conditionals, loop iterations, loop nests, procedure calls, and collections of such constructs. This opens the way to exploiting more parallelism in a larger number of applications than has been feasible in the past. Furthermore, it creates a demand for compilation techniques which exploit multi-grained parallelism, that is, the overlap of program regions of different granularities.; This thesis studies the exploitation of multi-grained parallelism. It presents a program representation called the program dependence graph (PDG) and a node labeling scheme that supplements it. These representations have been specialized to expose multi-grained parallelism and facilitate its exploitation on a multiple-instruction-stream architecture. The thesis investigates novel compilation techniques for exploiting multi-grained parallelism and explores the impact of synchronization cost on performance. These techniques perform partitioning, scheduling and synchronization of a single application for a multiple-instruction-stream architecture. The partitioning techniques make global trade-offs to select the granularity of parallelism to exploit in each part of the program so as to minimize the overall latency for a target architecture. The thesis describes an implementation of these representations and techniques called Pedigree, which is the first post-pass, retargetable compiler to target multiple-instruction-stream architectures. The SDIO and some SPEC benchmarks have been compiled by Pedigree and used to demonstrate its ability to parallelize code. The best results for exploiting multi-grained parallelism come from overlapping parallelized loop nests, something which is new to this work.

Keywords/Search Tags:

Parallelism, Multiple-instruction-stream, Architecture, Loop

Related items

1	Instruction-flow Scheduling Mechanism For High-performance SIMD DSP
2	Loop Realization And Optimization Based On X Stream Processor
3	Research On Parallel Processing Architecture For Block Cipher Based On Stream Architecture
4	Research On Instruction Management And System Virtualized Simulation Technique Of Stream Architecture
5	Studies On CRS Crossbar Based Single-Instruction Multiple-Data Stream Computing Architectures
6	High-efficiency Reconfigurable Array Computing: Architecture, Methodology And Application Mapping Technology
7	Study And Implementation Of Stream Instruction's Issuing Mechanism In X Processor
8	The Key Technology Research, Instruction-level Parallelism Compiled
9	The Research And Implementation Of Key Techniques On Block Cipher ASIP
10	Instruction-level Parallelism To Develop Key Technologies To Achieve