Improving access to shared data in a partitioned global address space programming model

Posted on:2010-12-03

Degree:Ph.D

Type:Thesis

University:University of Alberta (Canada)

Candidate:Barton, Christopher Mark

Full Text:PDF

GTID:2448390002474554

Subject:Computer Science

Abstract/Summary:

Partitioned Global Address Space (PGAS) programming languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, such as Unified Parallel C (UPC), combine the simplicity of shared-memory programming with the efficiency of the message-passing paradigm. PGAS languages partition the application's address space into private, shared-local, and shared-remote memory. The latency of shared-remote accesses is typically much larger than that of local, private accesses, especially when the underlying hardware is a distributed-memory machine and remote accesses imply communication over a network.This thesis introduces a new locality analysis, describes the implementation of four locality-aware optimizations in a UPC production compiler, and presents a performance evaluation of these techniques. The results of this empirical evaluation indicate that the analysis and code transformations implemented in the compiler are crucial to obtain good performance and for scalability. In some cases the optimized benchmarks run as much as 650 times faster than the unoptimized versions. In addition, the performance of many of the transformed UPC benchmarks is comparable with the performance of OpenMP and MPI implementations of the same benchmarks.To achieve good performance, an optimizing compiler must be able to handle two features commonly found in PGAS languages: shared data distribution and a parallel loop construct. When developing a parallel application, the programmer identifies data that is shared among threads and specifies how the shared data is distributed among the threads. This thesis introduces new static analyses that allow the compiler to distinguish between local shared data and remote shared data. The compiler then uses this information to reduce the time required to access shared data using three techniques. (i) When the compiler can prove that a shared data item is local to the accessing thread, accesses to the shared data are transformed into traditional memory accesses (ii) When several remote shared-data accesses are performed and all remote shared-data is owned by the same thread, a single coalesced shared-data access can replace several individual shared-data accesses (iii) When shared-data accesses require explicit communication to move shared data the compiler can overlap the communication with other computation to hide the communication latency.

Keywords/Search Tags:

Shared data, Address space, Programming, PGAS languages, Compiler, Access, Parallel, Communication

Related items

1	Hardware Support for Productive Partitioned Global Address Space (PGAS) Programming
2	New techniques for compiling data-parallel languages
3	Design And Implementation Of DGA A Parallel Programming Model That Support Out-of-core Computing
4	Research On Programming Model And Compiler Optimizations For CPU-GPU Heterogeneous Parallel Systems
5	A Parallel Programming Language With Shared Resource Declaration Design And Front-end Implement
6	Portable high performance and scalability of partitioned global address space languages
7	Design And Implementation Of Compiler Directives For Tasks Parallelism In Message Passing Computing
8	Jade: Compiler-supported multi-paradigm processor virtualization-based parallel programming
9	Extending PGAS Programming Model On Heterogeneous System
10	Study On Parallel Programming Models