Font Size: a A A

Software support for ordering memory operations in parallel systems

Posted on:2013-09-08Degree:Ph.DType:Dissertation
University:Purdue UniversityCandidate:Fang, XingFull Text:PDF
GTID:1458390008985493Subject:Engineering
Abstract/Summary:
Parallel processing is essential to exploiting the potential of multi-core processors. Correct and efficient programming for parallel machines is a notoriously difficult job done well by only a few select, well-trained programmers. However, parallel platforms are becoming ubiquitous, requiring far more programs to be written by regular programmers. This motivates the implementation of new parallel programming paradigms that are efficient and easy to reason about and use.;Modern processors implement relaxed memory models when used as part of a shared memory system, that is, one where loads and stores that do not reference the same memory location are allowed to execute in a different order than they appear in the program. Programming languages implement memory (or consistency) models that require other memory references to be executed in order, beyond those guaranteed to execute in order by the relaxed consistency model processor, i.e., they have a stricter memory model. An extreme example of a stricter memory model is the sequentially consistent memory model. A stricter model is thought by many to be easier to reason about than a relaxed model.;Current processors provide fence instructions that allow these stricter orders to be enforced. We present a flow-based fence insertion algorithm for effectively enforcing the orders required. This algorithm is implemented in the Pensieve-Jikes compiler. Data showing the effectiveness of the algorithm is provided.;New architectures have been proposed that aim to support high performance sequential consistency by committing groups of instructions (chunks) at one time. Aggressive compiler support is required to break programs into reasonable sized groups at strategic places, attaining a high-performance sequentially consistent environment. In the second half of this dissertation we present in detail a compiler algorithm and implementation that performs full-program automatic formation of chunks for such a blocked architecture. We show, for the first time, that fully automatic techniques with no programmer intervention provide a sequentially-consistent system that has a higher performance than conventional machines with relaxed memory models. For 8 full Java codes, we show that compiler generated code running on a simulated 4-processor blocked architecture and supporting sequential consistency, runs on average 5% faster than code on a conventional architecture supporting the more relaxed Java memory model.
Keywords/Search Tags:Memory, Parallel, Support, Relaxed, Order, Consistency
Related items