Compiler Guided Data Distribution For OpenMP Fortran Program

Posted on:2006-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:H C Zhou

Full Text:PDF

GTID:2178360185963273

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Distributed shared memory system is one of the most popular platforms for parallel computing at present. However, the time spent on remote memory accesses is much longer than that on local memory accesses, which is the main bottleneck of OpenMP programs to attain high performance on distributed shared memory system. Reasonable data distribution could improve the locality of data accesses effectively, and reduce the frequency of remote memory accesses. Most of commercial OpenMP compilers have supported the extended data distribution directives, then programmer can use these directives to specify reasonable data distribution. However, it increases the burden of programmer, and is against simplicity and transparency, which are the philosophy of OpenMP. In this thesis, compiler, through analyzing the referenced patterns of the shared arrays in OpenMP program, finds out the distributable array automatically, and inserts the selected data distribution directives into the program implicitly, so as to improve the data locality and scalability of OpenMP programs on distributed memory system.Data distribution is one of the most important approaches to improve the performance of OpenMP programs on distributed memory system. In this thesis, an automatic data distribution algorithm for OpenMP programs is presented and implemented in the CCRG OpenMP compiler. According to the two-steps analysis method, the compiler decides the optimal data distribution pattern for each shared array automatically in a quantitative way. In the first step, the compiler analyzes the data access patterns in each parallel loop nest, then computes a binding function from the accessed array elements to threads space so as to produce a candidate data distribution pattern for each shared array reference according to the OpenMP DO schedule clause; Moreover, we have concluded the necessary and sufficient condition of the compatibility of different data distribution patterns, and the match between binding function and canonical data distribution patterns. In the second step, an automatic numeration method is introduced to make the optimal selection of data distribution pattern based on the convex polytope, which is treated as the geometric model of iteration space and array space. We build the convex polytope composed of the iteration points which access the shared array locally, and compute the number of integer iteration points contained in that convex polytope. Then the result is expressed as the Ehrhart polynomial to indicate the amount of the local accesses which is the criterion of the choice of the best data distribution pattern.Although OpenMP can simplify data distribution, it is still difficult for compiler to make right decisions effectively and efficiently. To the best of our knowledge, nobody else has ever used the counting of polytope to guide data distribution, in this sense, our approach is innovative. Experiments show that under the selected optimal data distribution pattern, the performance of OpenMP programs is obviously better than that under the others.

Keywords/Search Tags:

OpenMP, Data Distribution, Polytope, Ehrhart Polynomial, Compiler

PDF Full Text Request

Related items

1	Research On OpenMP Towards Cluster Systems
2	Research On Techniques To Improve The Performance Of OpenMP System On Cluster
3	Implementation And Optimization Of OpenMP On Heterogeneous Mutil-core Processor
4	The Design And Implementation Of An OpenMP-to-OpenCL Code Automatic Conversion Tool
5	Studies On Data Locality And Compiler Optimization Techniques
6	Research On Compilation And Optimization For OpenMP Programs
7	Automatic Offloading And Optimization Of Openmp Programs For Heterogeneous Platforms
8	Research On Loop Distribution Technology For Shenwei GCC Compiler System
9	High Productivity OpenMP For Distributed Shared Memory Architecture
10	Combinatorial representation theory of affine sl2 via polytope calculus