Font Size: a A A

High performance and scalable MPI intra-node communication middleware for multi-core clusters

Posted on:2010-02-16Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Chai, LeiFull Text:PDF
GTID:1448390002988075Subject:Computer Science
Abstract/Summary:
Cluster of workstations is one of the most popular architectures in high performance computing, thanks to its cost-to-performance effectiveness. As multi-core technologies are becoming mainstream, more and more clusters are deploying multicore processors as the build unit. In the latest Top500 supercomputer list published in November 2008, about 85% of the sites use multi-core processors from Intel and AMD. Message Passing Interface (MPI) is one of the most popular programming models for cluster computing. With increased deployment of multi-core systems in clusters, it is expected that considerable communication will take place within a node. This suggests that MPI intra-node communication is going to play a key role in the overall application performance.;This dissertation presents novel MPI intra-node communication designs, including user level shared memory based approach, kernel assisted direct copy approach, and efficient multi-core aware hybrid approach. The user level shared memory based approach is portable across operating systems and platforms. The processes copy messages into and from a shared memory area for communication. The shared buffers are organized in a way such that it is efficient in cache utilization and memory usage. The kernel assisted direct copy approach takes help from the operating system kernel and directly copies message from one process to another so that it only needs one copy and improves performance from the shared memory based approach. In this approach, the memory copy can be either CPU based or DMA based. This dissertation explores both directions and for DMA based memory copy, we take advantage of novel mechanism such as I/OAT to achieve better performance and computation and communication overlap. To optimize performance on multicore systems, we efficiently combine the shared memory approach and the kernel assisted direct copy approach and propose a topology-aware and skew-aware hybrid approach. The dissertation also presents comprehensive performance evaluation and analysis of the approaches on contemporary multi-core systems such as Intel Clovertown cluster and AMD Barcelona cluster, both of which are quad-core processors based systems.;Software developed as a part of this dissertation is available in MVAPICH and MVAPICH2, which are popular open-source implementations of MPI-1 and MPI-2 libraries over InfiniBand and other RDMA-enabled networks and are used by several hundred top computing sites all around the world.
Keywords/Search Tags:MPI intra-node communication, Performance, Kernel assisted direct copy approach, Multi-core, Cluster, Shared memory based approach, Computing
Related items