High performance and scalable MPI intra-node communication middleware for multi-core clusters

Posted on:2010-02-16

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Chai, Lei

Full Text:PDF

GTID:1448390002988075

Subject:Computer Science

Abstract/Summary:

Cluster of workstations is one of the most popular architectures in high performance computing, thanks to its cost-to-performance effectiveness. As multi-core technologies are becoming mainstream, more and more clusters are deploying multicore processors as the build unit. In the latest Top500 supercomputer list published in November 2008, about 85% of the sites use multi-core processors from Intel and AMD. Message Passing Interface (MPI) is one of the most popular programming models for cluster computing. With increased deployment of multi-core systems in clusters, it is expected that considerable communication will take place within a node. This suggests that MPI intra-node communication is going to play a key role in the overall application performance.;This dissertation presents novel MPI intra-node communication designs, including user level shared memory based approach, kernel assisted direct copy approach, and efficient multi-core aware hybrid approach. The user level shared memory based approach is portable across operating systems and platforms. The processes copy messages into and from a shared memory area for communication. The shared buffers are organized in a way such that it is efficient in cache utilization and memory usage. The kernel assisted direct copy approach takes help from the operating system kernel and directly copies message from one process to another so that it only needs one copy and improves performance from the shared memory based approach. In this approach, the memory copy can be either CPU based or DMA based. This dissertation explores both directions and for DMA based memory copy, we take advantage of novel mechanism such as I/OAT to achieve better performance and computation and communication overlap. To optimize performance on multicore systems, we efficiently combine the shared memory approach and the kernel assisted direct copy approach and propose a topology-aware and skew-aware hybrid approach. The dissertation also presents comprehensive performance evaluation and analysis of the approaches on contemporary multi-core systems such as Intel Clovertown cluster and AMD Barcelona cluster, both of which are quad-core processors based systems.;Software developed as a part of this dissertation is available in MVAPICH and MVAPICH2, which are popular open-source implementations of MPI-1 and MPI-2 libraries over InfiniBand and other RDMA-enabled networks and are used by several hundred top computing sites all around the world.

Keywords/Search Tags:

MPI intra-node communication, Performance, Kernel assisted direct copy approach, Multi-core, Cluster, Shared memory based approach, Computing

Related items

1	Research And Application Of Multi-layer Parallel Computing Approach For Finite Element Structural Analysis
2	The Implementation And Performance Of User-level Communication Protocol For Shared Memory Clusters
3	Optimization Of Secondary Shared Memory In Heterogeneous Multi-core Systems For High-density Computing
4	Research On Optimization Of Shared Memory Mechanism Towards High Communication Performance Between Virtual Machines
5	Research On Performance Optimization For Parallel Discrete Event Simulaiton On Multi-core Cluster
6	The Design And Implementation Of Embedded Multi-core Processor Communication Methods
7	Naplus: A Software Shared Memory For Virtual Clusters
8	An empirical approach to communication and performance modeling for message passing parallel applications on cluster systems
9	Memory Optimization On Chip Multi-core Processors
10	The Design And Implementation Of PSMC And Rapid DMA For Multi-core DSP