Font Size: a A A

Parallel Programming With Communication Efficiency On MIC-Enhanced Cluster

Posted on:2015-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:X N DongFull Text:PDF
GTID:2348330509460716Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Future exascale systems are expected to adopt compute nodes that incorporate many accelerators, this paper thus investigates the topic of programming on MIC-enhanced cluster with communication efficiency, and the cluster has multiple Xeon Phi coprocessors inside one compute node.First, in order to realize the high efficiency of intra-node communication between host CPU and multiple coprocessors, besides a standard MPI-Open MP programming approach, which belongs to the symmetric usage mode, two offload-mode programming approaches are considered. The first offload approach is conventional and uses compiler pragmas, whereas the second one is new and combines Intel's APIs of coprocessor offload infrastructure(COI) and symmetric communication interface(SCIF) for low-latency communica- tion. While the pragma-based approach allows simpler programming, the COI-SCIF approach has three advantages in lower overhead associ- ated with launching offloaded code, higher data transfer bandwidths, and more advanced asynchrony between computation and data move- ment. The low-level COI-SCIF approach is also shown to have benefits over the MPI-Open MP counterpart. Considering the parallel programming of the whole cluster, different facilities have to work together to fully utilize all of the computing resources. This paper presents a novel hybrid programming model with communication efficiency for applications with Stencil computing pattern and structured grid.The hybrid programming model uses many kinds of technologies, including MPI, OpenMP, COI and SCIF, to realize the whole design of the framework abstract. Moreover,load balancing between host CPUs and multiple MICs are considered, and a hierarchical pipeline strategy is presented to improve inter-node and intra-node communication.All the experiments and tests are based on a real-world 3D application, the key of which is the Stencil computing pattern and structured grid on Tianhe-2. According to a series of detailed experiments, this paper compares the three programming approaches:MPI-Open MP, pragma-based and COI-SCIF, both in bandwidth and performance, and the COI-SCIF shown it's benefits over others. Considering multi-nodes, the different strategies of workload partition are tested, after which we get the optimal result to realize the load balancing. And the effects of the communication optimization on performance are shown after a comparison between before optimization and after. Numerical simulation of subcellular Ca2+dynamics with a resolution down to one nanometer requires enormous computational power, so that based on the programming approach discussed above,we give the simulation results of pathological heart cells Ca2+dynamics on Tianhe-2,expecting to find the important conclusions in biology together with domain experts.This paper investigates the programming model of one node and multiple MICs cluster. Our findings not only shed some light on this new topic of using multiple accelerators within one compute node, but provide a good starting point for fully utilizing Tianhe-2 in future.
Keywords/Search Tags:Intel Xeon Phi coprocessor, SCIF, Hybrid programming model with communication efficiency, Tianhe-2
PDF Full Text Request
Related items