High Productivity OpenMP For Distributed Shared Memory Architecture

Posted on:2008-10-09

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Huang

Full Text:PDF

GTID:1118360242499266

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays, high-end computing has changed its ambition from the pure pursuit of high performance to the realization of high productivity systems, which includes the improvement in performance, programmability, portability and robustness, and the reduction of costs in development, running and maintenance of systems. High productivity computer systems must be supported with high productivity programming environments. Furthermore, the applications confronting the future teraflops and petaflops systems are multidisciplinary and multiscale, whose complexity requires domain experts and software scientists from different disciplines to work together for development, management and maintainence. Such kind of participation puts higher requirements to the performance, programmability, portability and fault-tolerance of programming environments. With such features as easy programmability, supporting incremental design patterns, good maintainability and high portability, OpenMP will be the mainstream parallel programming language in the long run.Focusing on development of high productivity OpenMP programming environment for large-scale parallel systems, this thesis systematically investigates some key techniques in implementing OpenMP on large-scale distributed shared memory (DSM) systems, DSM-oriented OpenMP extensions, compiler-guided data prefetching, checkpoint/restart and OpenMP-oriented low-power optimization and others related techniques of OpenMP. The main contributions of the thesis are as follows.1. CCRG OpenMP, an OpenMP parallel compiler, has been designed and implemented for large-scale parallel computer systems. We present the compiling-time and linking-time coordinated OpenMP shared data placement strategy, which not only overcomes the disadvantage that shared memory is required to explicitly allocate in distributed OS, but also provides support for data locality optimization of Checkpointing. Several source-level optimization techniques are used to improve performance. The practical experiments show the performance of CCRG OpenMP on our SCCMP system is equal to that of Intel compiler 9.1 on SGI Altix.2. Two OpenMP directives BARRIER (thread_id) and ALLREDUCTION have been presented to reduce the rapid-increasing overhead in such global operations as barrier and reduction incurred when the scale of OpenMP parallel programs is enlarged, and the implementing algorithms of the new directives are given. The experiments show that for real scientific application Plasma Physics, when the number of threads is 64, the performance has been increased 76%.3. The compiler-directed two-stage data prefetch algorithm has been presented to overcome the inaccuracy incurred by the inconsistency between remote access latency and local access latency. The algorithm is evaluated by means of a static performance analysis model. The experiments show that, by using the presented algorithm, the performance has been improved 14% for swim in SPEC OMP2001 when the number of threads is 32, and 9% when the number of threads is 64.4. We have presented the system-level and application-level coordinated OpenMP Checkpoint/Restart mechanisms, and a blocked OpenMP Checkpoint protocol. Based on these mechanisms, a CCRG OpenMP Checkpoint/Restart system has been implemented. The system provides the complete supports for OpenMP 2.0 API, with good scalability and applicability.5. Energy optimization techniques are studied based on OpenMP programming model. Three energy optimization methods and implementations are presented for parallel systems with dynamic voltage scaling (DVS) capabilities. The barrier section based analysis of worst-case execution-time (WCET) and DVS methods are proposed for WCET based energy optimization. These methods use barrier section as the unit of analysis and voltage scaling, which avoid the impact of barrier on program execution and energy consumptions caused by load imbalance due to barrier. An analysis model is built and the simulation shows that these techniques can effectively reduce energy consumptions for parallel systems.

Keywords/Search Tags:

High-Productivity, OpenMP, OpenMP Externsion, Two-Stage Data Prefetch, Checkpoint/Restart, Low-Energy Optimization

PDF Full Text Request

Related items

1	Checkpoint Optimization Based On Active Varialbe Analysis In OpenMP Programs
2	Research On Compilation And Optimization For OpenMP Programs
3	Research On Performance And Energy Consumption Co-optimization Of OpenMP Program In Power Constrained HPC System
4	Research On Analysis And Optimization For OpenMP Program
5	Design And Implementation Of Open Data Analysis System Based On OpenMP
6	Research On Optimization Technology Of OpenMP For Open64
7	Automatic Offloading And Optimization Of Openmp Programs For Heterogeneous Platforms
8	Research On High Performance Of GRAPES Tangent/Adjoint Model With The MPI/OpenMP
9	Research On OpenMP Towards Cluster Systems
10	Research On Techniques To Improve The Performance Of OpenMP System On Cluster