Software support for ordering memory operations in parallel systems

Posted on:2013-09-08

Degree:Ph.D

Type:Dissertation

University:Purdue University

Candidate:Fang, Xing

Full Text:PDF

GTID:1458390008985493

Subject:Engineering

Abstract/Summary:

Parallel processing is essential to exploiting the potential of multi-core processors. Correct and efficient programming for parallel machines is a notoriously difficult job done well by only a few select, well-trained programmers. However, parallel platforms are becoming ubiquitous, requiring far more programs to be written by regular programmers. This motivates the implementation of new parallel programming paradigms that are efficient and easy to reason about and use.;Modern processors implement relaxed memory models when used as part of a shared memory system, that is, one where loads and stores that do not reference the same memory location are allowed to execute in a different order than they appear in the program. Programming languages implement memory (or consistency) models that require other memory references to be executed in order, beyond those guaranteed to execute in order by the relaxed consistency model processor, i.e., they have a stricter memory model. An extreme example of a stricter memory model is the sequentially consistent memory model. A stricter model is thought by many to be easier to reason about than a relaxed model.;Current processors provide fence instructions that allow these stricter orders to be enforced. We present a flow-based fence insertion algorithm for effectively enforcing the orders required. This algorithm is implemented in the Pensieve-Jikes compiler. Data showing the effectiveness of the algorithm is provided.;New architectures have been proposed that aim to support high performance sequential consistency by committing groups of instructions (chunks) at one time. Aggressive compiler support is required to break programs into reasonable sized groups at strategic places, attaining a high-performance sequentially consistent environment. In the second half of this dissertation we present in detail a compiler algorithm and implementation that performs full-program automatic formation of chunks for such a blocked architecture. We show, for the first time, that fully automatic techniques with no programmer intervention provide a sequentially-consistent system that has a higher performance than conventional machines with relaxed memory models. For 8 full Java codes, we show that compiler generated code running on a simulated 4-processor blocked architecture and supporting sequential consistency, runs on average 5% faster than code on a conventional architecture supporting the more relaxed Java memory model.

Keywords/Search Tags:

Memory, Parallel, Support, Relaxed, Order, Consistency

Related items

1	Studies On Shared-Memory Management And Optimization Technologies In Parallel And Distributed Operating Systems
2	Research And Implementation Of Consistency Mechanism Of In-memory File System Based On RDAM And NVM
3	Limitations and capabilities of weak memory consistency systems
4	Consistency model transitions in shared memory
5	High-Performance Consistency For Secure Non-Volatile Memory Systems
6	Study Of Change-Oriented Service Version Consistency Checking Method
7	Multi-machine Interconnected In The Smp Environment
8	Research On Metadata Crash Consistency Guarantee Mechanism In Secure Non-Volatile Memory
9	Research And Implementation Of Memory Consistency Model In X Multi-Processor
10	An Operational Relaxed Memory Model And Program Logic For Concurrency Verification