Hardware Support for Productive Partitioned Global Address Space (PGAS) Programming

Posted on:2017-11-19

Degree:Ph.D

Type:Dissertation

University:The George Washington University

Candidate:Serres, Olivier

Full Text:PDF

GTID:1468390014974104

Subject:Computer Engineering

Abstract/Summary:

In order to exploit the increasing number of transistors, and due to the limitations of frequency scaling, the number of cores inside a chip keeps growing. As many-core chips become ubiquitous, there is a greater need for a more productive and efficient parallel programming model. The easy-to-use, but locality-agnostic, shared memory model (e.g. OpenMP) is unable to efficiently exploit memory locality in systems with Non-Uniform Memory Access (NUMA) and Non-Uniform Cache-Access (NUCA) effects. The locality-aware, but explicit, message-passing model (e.g. MPI1) does not provide a productive development environment due to its two-sided communication and a distributed (and isolated) memory model.;The Partitioned Global Address Space (PGAS) programming model strikes a balance between those two extremes via a global address space that is provided for ease-of-use, but is partitioned for locality awareness. The user-friendly PGAS memory model, however, comes at a performance cost, due to the needed address mapping, which can hinder its potential for performance. To mitigate this overhead and achieve full performance, compiler optimizations may be applied, but are often insufficient. Alternatively, manual optimizations can be applied but they are quite cumbersome and, as such, are unproductive. As a result, the overall benefit of PGAS has been severely limited. In this dissertation, we improved both the productivity and performance of PGAS by introducing a novel hardware support. This PGAS hardware support efficiently handles the complex PGAS mapping and communication without the intervention of an application developer. By introducing the new hardware at the micro-architecture level, fine grain and low latency local shared memory accesses are supported. The hardware is also made available through an ISA extension, so that it can easily be exploited by PGAS compilers to efficiently access and traverse the PGAS memory space. The automatic code generation eliminates the need for hand-tuning, and thus simultaneously improve both the performance and productivity of PGAS languages. This research also introduces and evaluates the possibility for the hardware support to handle a variety of PGAS languages.;Results are obtained on two different system implementations: the first is based on the well-adopted full system simulator Gem5, which allows the precise evaluation of the performance gain. Two prototype compilers supporting the new hardware are created for experimentation by extending the Berkeley Unified Parallel C (UPC) compiler and the Cray Chapel compiler. This allows unmodified code to use the new instructions without any user intervention, thereby creating a productive programming environment. The second proof-of-concept implementation is a hardware prototype based on the multi-core Leon3 softcore processor running on a Virtex-6 FPGA. This allowed us to not only verify the feasibility of the implementation but also to evaluate the cost of the new hardware and its instructions.;This research has shown very promising results. With benchmarks in UPC and Chapel including the NAS Parallel Benchmarks implemented in UPC, a speedup of up to 5.5x is demonstrated when using the hardware support with unmodified codes. Unmodified code performance using this hardware was shown to also surpass the performance of manually optimized UPC code in some of the cases by up to 10%. With Chapel, we obtained measurable speed-ups of up to 19x. Additionally, the hardware prototype demonstrated that only a very small area increase is needed.

Keywords/Search Tags:

Hardware, PGAS, Global address space, Productive, Partitioned, Programming, UPC, Performance

Related items

1	Improving access to shared data in a partitioned global address space programming model
2	Scalable task parallel programming in the partitioned global address space
3	Portable high performance and scalability of partitioned global address space languages
4	APGAS-Oriented Resource Management And Optimization
5	Extending PGAS Programming Model On Heterogeneous System
6	Design And Implementation Of DGA A Parallel Programming Model That Support Out-of-core Computing
7	Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Language
8	Study On Extension Of IP Address Space Combining With Autonomous System Number
9	A dual address space architecture: Implementation and evaluation
10	Product Quality Monitoring And Managing System Based On Manufacturing Process