Font Size: a A A

Memory systems for parallel programming

Posted on:1997-02-13Degree:Ph.DType:Thesis
University:The University of Wisconsin - MadisonCandidate:Richards, Bradley EricFull Text:PDF
GTID:2468390014980113Subject:Computer Science
Abstract/Summary:
Distributed Shared-Memory (DSM) computers, which partition physical memory among a collection of workstation-like computing nodes, are emerging as the way to implement parallel computers, as they promise scalability and high performance. Shared-memory DSM machines use a coherence protocol to manage the replication of data and to ensure that a parallel program sees a consistent view of memory.; Applications have very different patterns of communication and no single, general-purpose protocol suits all programs. This has prompted interest in systems in which a protocol is implemented in flexible software instead of being fixed in hardware. DSM machines with software-implemented coherence protocols provide opportunities for a variety of more complex and application-specific protocols and allow for protocols that do not just ensure consistent memory, but also provide new functionality and semantics.; Parallel programming has long faced a tension between the goals of high performance and ease of use. Languages and tools can make parallel computers easier to use, but concerns about their efficiency have limited their usage. This thesis demonstrates that some high-level languages and tools can be implemented more efficiently by taking advantage of the cache coherence protocols that underly software DSM machines, thereby improving both performance and ease of use.; This thesis describes a family of custom protocols that efficiently implement a large-grain data-parallel language C{dollar}sp{lcub}**{rcub}{dollar}. On programs for which static analysis is imprecise, these Loosely Coherent Memory (LCM) protocols improve performance from a few percent up to a factor of 3, and reduce memory overheads from a factor of 2 to a factor of 5 over a compiler-copying scheme. LCM also improves performance in C-code programs by up to a factor of 3.; This thesis also presents custom cache-coherence protocols that perform on-the-fly detection of actual data races for programs with barrier synchronization. Overheads in execution time for the race-detection protocols were shown to range from zero to less than a factor of three--a significant improvement over comparable approaches--and race-detection protocols found actual program errors in two applications.
Keywords/Search Tags:Memory, Protocols, DSM, Parallel, Factor
Related items