Portable high performance and scalability of partitioned global address space languages

Posted on:2008-01-09

Degree:Ph.D

Type:Thesis

University:Rice University

Candidate:Coarfa, Cristian

Full Text:PDF

GTID:2448390005470068

Subject:Computer Science

Abstract/Summary:

Large scale parallel simulations are fundamental tools for engineers and scientists. Consequently, it is critical to develop both programming models and tools that enhance development time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling programs. This thesis addresses this challenge in two ways. First, we show that Co-array Fortran (CAF), a shared-memory parallel programming model, can be used to write scientific codes that exhibit high performance on modern parallel systems. Second, we describe a novel technique for analyzing parallel program performance and identifying scalability bottlenecks, and apply it across multiple programming models.; Although the message passing parallel programming model provides both portability and high performance, it is cumbersome to program. CAF eases this burden by providing a partitioned global address space, but has before now only been implemented on shared-memory machines. To significantly broaden CAF's appeal, we show that CAF programs can deliver high-performance on commodity cluster platforms. We designed and implemented cafc, the first multiplatform CAF compiler, which transforms CAF programs into Fortran 90 plus communication primitives. Our studies show that CAF applications matched or exceeded the performance of the corresponding message passing programs. For good node performance, cafc employs an automatic transformation called procedure splitting, for high performance on clusters, we vectorize and aggregate communication at the source level. We extend CAF with hints enabling overlap of communication with computation. Overall, our experiments show that CAF versions of NAS benchmarks match the performance of their MPI counterparts on multiple platforms.; The increasing scale of parallel systems makes it critical to pinpoint and fix scalability bottlenecks in parallel programs. To automatize this process, we present a novel analysis technique that uses parallel scaling expectations to compute scalability scores for calling contexts, and then guides an analyst to hot spots using an interactive viewer. Our technique is general and may thus be applied to several programming models; in particular, we used it to analyze CAF and MPI codes, among others. Applying our analysis to CAF programs highlighted the need for language-level collective operations which we both propose and evaluate.

Keywords/Search Tags:

CAF, High performance, Parallel, Programming models, Scalability

Related items

1	Pattern Of Parallel Programming Research
2	Performance Analysis And Optimization Of Current Parallel Programming Models For Many-core Systems
3	Study On Parallel Programming Models
4	Research On The Performance And Scalability Of Data-Parallel Programming Model On Multicore
5	Supporting high-level, high-performance parallel programming with library-driven optimization
6	Parallel Computing Scalability Studies And Applications On The Distributed Memory Environments
7	Design Of Parallel Systems For Stateful Applications On Clusters
8	Electron Optical System Cad Software For High Performance Computing Research
9	Object-oriented stream programming using aspects: A high-productivity programming paradigm for hybrid platforms
10	Concurrent Programming Patterns For Scalable Web Architecture