Font Size: a A A

Improving access to shared data in a partitioned global address space programming model

Posted on:2010-12-03Degree:Ph.DType:Thesis
University:University of Alberta (Canada)Candidate:Barton, Christopher MarkFull Text:PDF
GTID:2448390002474554Subject:Computer Science
Abstract/Summary:
Partitioned Global Address Space (PGAS) programming languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, such as Unified Parallel C (UPC), combine the simplicity of shared-memory programming with the efficiency of the message-passing paradigm. PGAS languages partition the application's address space into private, shared-local, and shared-remote memory. The latency of shared-remote accesses is typically much larger than that of local, private accesses, especially when the underlying hardware is a distributed-memory machine and remote accesses imply communication over a network.This thesis introduces a new locality analysis, describes the implementation of four locality-aware optimizations in a UPC production compiler, and presents a performance evaluation of these techniques. The results of this empirical evaluation indicate that the analysis and code transformations implemented in the compiler are crucial to obtain good performance and for scalability. In some cases the optimized benchmarks run as much as 650 times faster than the unoptimized versions. In addition, the performance of many of the transformed UPC benchmarks is comparable with the performance of OpenMP and MPI implementations of the same benchmarks.To achieve good performance, an optimizing compiler must be able to handle two features commonly found in PGAS languages: shared data distribution and a parallel loop construct. When developing a parallel application, the programmer identifies data that is shared among threads and specifies how the shared data is distributed among the threads. This thesis introduces new static analyses that allow the compiler to distinguish between local shared data and remote shared data. The compiler then uses this information to reduce the time required to access shared data using three techniques. (i) When the compiler can prove that a shared data item is local to the accessing thread, accesses to the shared data are transformed into traditional memory accesses (ii) When several remote shared-data accesses are performed and all remote shared-data is owned by the same thread, a single coalesced shared-data access can replace several individual shared-data accesses (iii) When shared-data accesses require explicit communication to move shared data the compiler can overlap the communication with other computation to hide the communication latency.
Keywords/Search Tags:Shared data, Address space, Programming, PGAS languages, Compiler, Access, Parallel, Communication
Related items