Font Size: a A A

Scalable formal dynamic verification of MPI programs through distributed causality tracking

Posted on:2012-02-04Degree:Ph.DType:Thesis
University:The University of UtahCandidate:Vo, AnhFull Text:PDF
GTID:2458390011452155Subject:Computer Science
Abstract/Summary:
Almost all high performance computing applications are written in MPI, which will continue to be the case for at least the next several years. Given the huge and growing importance of MPI, and the size and sophistication of MPI codes, scalable and incisive MPI debugging tools are essential. Existing MPI debugging tools have, despite their strengths, many glaring deficiencies, especially when it comes to debugging under the presence of nondeterminism related bugs, which are bugs that do not always show up during testing. These bugs usually become manifest when the systems are ported to different platforms for production runs.;This dissertation focuses on the problem of developing scalable dynamic verification tools for MPI programs that can provide a coverage guarantee over the space of MPI nondeterminism. That is, the tools should be able to detect different outcomes of nondeterministic events in an MPI program and enforce all those different outcomes through repeated executions of the program with the same test harness.;We propose to achieve the coverage guarantee by introducing efficient distributed causality tracking protocols that are based on the matches-before order. The matchesbefore order is introduced to address the shortcomings of the Lamport happens-before order [40], which is not sufficient to capture causality for MPI program executions due to the complexity of the MPI semantics. The two protocols we propose are the Lazy Lamport Clocks Protocol (LLCP) and the Lazy Vector Clocks Protocol (LVCP). LLCP provides good scalability with a small possibility of missing potential outcomes of nondeterministic events while LVCP provides full coverage guarantee with a scalability tradeoff. In practice, we show through our experiments that LLCP provides the same coverage as LVCP.;This thesis makes the following contributions: (1) The MPI matches-before order that captures the causality between MPI events in an MPI execution. (2) Two distributed causality tracking protocols for MPI programs that rely on the matches-before order. (3) A Distributed Analyzer for MPI programs (DAMPI), which implements the two aforementioned protocols to provide scalable and modular dynamic verification for MPI programs. (4) Scalability enhancement through algorithmic improvements for ISP, a dynamic verifier for MPI programs.
Keywords/Search Tags:MPI programs, Distributed causality tracking, Dynamic, MPI debugging tools, Scalable, LLCP provides
Related items