Nonblocking Message Passing Based On Deterministic Virtual Memory Model

Posted on:2017-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:Q L Zhang

Full Text:PDF

GTID:2308330485953697

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

On multi-core systems, the interleaving executions of multiple threads or multiple processes has indeterminacy which may cause security risks and the difficulty of developing or debugging parallel software. To ensure the determinism of parallel programs, we have proposed a deterministic producer-consumer shared virtual memory model--SPMC. Based on SPMC, we built message passing channels and provided DetMP library to program. Further, we proposed DetMPI which implemented a subset of MPI with deterministic semantics. Unmodified MPI programs can be built and deterministically run with DetMPI, however, the performance of some DetMPI programs using DetMPI are poor compared with the nondeterministic MPI implementation.Experimental results indicate that one of the roots of poor performance is the blocking of underlying SPMC model and channels atop it. DetMPI used blocking channels combined with message buffering mechanism to implement nonblocking MPI communications. To improve the performance of DetMPI, this thesis aims to extend SPMC and DetMP with nonblocking communication. The main contributions are as follows:(1) At SPMC and DetMP levels, more SPMC primitives and channel APIs to support nonblocking communication are introduced and then applied to re-implement some communications in DetMPI. We rewrite some communication mods in DetMPI, including point-to-point nonblocking communication and collective communications with 1:N or N:1 communication mode.(2) At channel level, two different nonblocking implementation mechanisms —SelfMP and CothreadMP — are proposed to implement the extended DetMP. In SelfMP, an MPI process does its computation and communication by itself, while in CothreadMP, an MPI process deals with its computation and lets its slave thread (co-thread) do actual message transmission.Experimental on a 32-core Linux machine show that for 7 OSU MPI collective workloads, their execution times in original blocking version are about 0.98X-2.21X of those in SelfMP. For 1:N and N:1 collective communication workloads-comp-bcast and comp-gather, CothreadMP is nearly 1.2X faster than SelfMP at average.(3) Based on traditional shared virtual memory, a framework of concurrent multicast queues (CMQue) is proposed and is used to re-implement the DetMP interface. This work aims to evaluate the performance influence of various factors to DetMP programs, including memory models (SPMC or traditional shared memory), organizations of data structures (sequencial or linked list) and synchronization controls (coarse-grained lock, fine-grained lock or atomic compare-swap instruction). Six CMQue implementations are further implemented, which combine 2 kinds of organizations of data structures and 3 kinds of synchronization controls.For beast workload, when the producer multicasts messages to 23 consumers on 32-core system, execution time of beast using coarse-grained lock version is about 4.5X of the fine-grained lock version, about 8X of lock-free version and about 6.2X of SPMC version. What’s more, the experimental result of PARSEC dedup indicates that SPMC channel has the best scalability and is about 7X faster than other 6 CMQue implementations on 32 cores.

Keywords/Search Tags:

producer-consumer virtual memory, deterministic parallelism, Message Passing Interface, nonblocking, multi-core

PDF Full Text Request

Related items

1	Research On Key Techniques Of Deterministic Multiprocessing Targeting Multicore/manycore Architectures
2	Research On Virtual Machine Based Deterministic Execution Of Multi-core System
3	Research On Memory-level Parallelism For Multi-core Microprocessor Chip
4	A performance comparison: MPICH, message passing interface against Treadmarks, distributed shared memory
5	Proof Method And Application Of Deterministic Parallel Programming Model
6	Research Of Multi-core Parallel Compiler Based On The Characteristic Function Library
7	Design And Implementation Of Compiler Directives For Tasks Parallelism In Message Passing Computing
8	Research Of Memory Race Recording Mechanism In Deterministic Multi-Core Replay Based On SPARC Architecture
9	The Research Of Opto-electronic Hybrid Interconnect Many-core Architecture Using Message Passing Communication Mechanism
10	Research On Stream Program Virtual Machine For Multi-core Processor With Distributed Memory