Font Size: a A A

Nonblocking Message Passing Based On Deterministic Virtual Memory Model

Posted on:2017-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q L ZhangFull Text:PDF
GTID:2308330485953697Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
On multi-core systems, the interleaving executions of multiple threads or multiple processes has indeterminacy which may cause security risks and the difficulty of developing or debugging parallel software. To ensure the determinism of parallel programs, we have proposed a deterministic producer-consumer shared virtual memory model--SPMC. Based on SPMC, we built message passing channels and provided DetMP library to program. Further, we proposed DetMPI which implemented a subset of MPI with deterministic semantics. Unmodified MPI programs can be built and deterministically run with DetMPI, however, the performance of some DetMPI programs using DetMPI are poor compared with the nondeterministic MPI implementation.Experimental results indicate that one of the roots of poor performance is the blocking of underlying SPMC model and channels atop it. DetMPI used blocking channels combined with message buffering mechanism to implement nonblocking MPI communications. To improve the performance of DetMPI, this thesis aims to extend SPMC and DetMP with nonblocking communication. The main contributions are as follows:(1) At SPMC and DetMP levels, more SPMC primitives and channel APIs to support nonblocking communication are introduced and then applied to re-implement some communications in DetMPI. We rewrite some communication mods in DetMPI, including point-to-point nonblocking communication and collective communications with 1:N or N:1 communication mode.(2) At channel level, two different nonblocking implementation mechanisms —SelfMP and CothreadMP — are proposed to implement the extended DetMP. In SelfMP, an MPI process does its computation and communication by itself, while in CothreadMP, an MPI process deals with its computation and lets its slave thread (co-thread) do actual message transmission.Experimental on a 32-core Linux machine show that for 7 OSU MPI collective workloads, their execution times in original blocking version are about 0.98X-2.21X of those in SelfMP. For 1:N and N:1 collective communication workloads-comp-bcast and comp-gather, CothreadMP is nearly 1.2X faster than SelfMP at average.(3) Based on traditional shared virtual memory, a framework of concurrent multicast queues (CMQue) is proposed and is used to re-implement the DetMP interface. This work aims to evaluate the performance influence of various factors to DetMP programs, including memory models (SPMC or traditional shared memory), organizations of data structures (sequencial or linked list) and synchronization controls (coarse-grained lock, fine-grained lock or atomic compare-swap instruction). Six CMQue implementations are further implemented, which combine 2 kinds of organizations of data structures and 3 kinds of synchronization controls.For beast workload, when the producer multicasts messages to 23 consumers on 32-core system, execution time of beast using coarse-grained lock version is about 4.5X of the fine-grained lock version, about 8X of lock-free version and about 6.2X of SPMC version. What’s more, the experimental result of PARSEC dedup indicates that SPMC channel has the best scalability and is about 7X faster than other 6 CMQue implementations on 32 cores.
Keywords/Search Tags:producer-consumer virtual memory, deterministic parallelism, Message Passing Interface, nonblocking, multi-core
PDF Full Text Request
Related items