Font Size: a A A

Optimization Of Parallel Codes On SMP Clusters

Posted on:2003-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:1118360185496928Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Clusters of Symmetric Multiprocessors have been widespread in high performance computing, the architecture have the dramatic feature of hybrid memory models: shared-memory within nodes and distributed-memory among nodes . The hybrid architecture puts forward the problem of programming model and performance. Hybrid programming model mixing MPI and threads brings out greater complexity to programmers, we believe that a unified programming model should be adopted on SMP clusters. In this paper, an extension to OpenMP is devised to match the underlying architecture, and an implementation which supports multiple level parallelism is given. In-depth research is done on issues particular to this hierarchy architecture.The main contributions of this paper are:1) Based on the notion of"Distributed OpenMP", devise an OpenMP extension suited to the cluster architecture, several execution models are discussed, a general framework is given, fundamental strategies about how to determine multiple level parallelism is discussed.2) Data and computation partition is also important for SMP clusters, a redundant computation partitioning technique is given to optimizing inter-node communication. The technique focus on parallel loop sequences, the idea is to choose redundant computation properly for each parallel loop, in order to eliminate inter-loop communications which are usually introduced using ordinary partitioning strategy.3) A new approach based on loop fusion is presented to exploit pipelining parallelism. The technique exploits pipelining from complex loop structures, which distinguishes itself from traditional pipelining techniques.4) In a global communication optimization framework, redundant message elimination and message coalescing are realized. The Introduction of globally shadowed area greatly reduces data redistributions incurred by procedural calls. The interprocedural optimization of regular communications further eliminates redundant messages.5) The grain of"intra-node"parallelism are of importance to program performance, a construction algorithm of parallel regions is given interprocedurally.6) A new synchronization elimination technique is presented, based on static computation partitioning. Two kinds of notions are introduced to describe the relationship between references, where data dependence occurs. The notions are called strong consistent, and consistent with respect to computation partitioning, about each of which the impact to synchronization is discussed in the term of"synchronization-free offset". An algorithm is presented to eliminate...
Keywords/Search Tags:SMP cluster, OpenMP, computation partitioning, shadowed area, software pipelining, parallel region, synchronization elimination
PDF Full Text Request
Related items