Font Size: a A A

Lightweight and Low-Cost Mechanisms to Enable Parallel Monitoring of Multithreaded Applications

Posted on:2014-03-22Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Vlachos, EvangelosFull Text:PDF
GTID:1458390005485566Subject:Engineering
Abstract/Summary:
The goals of this dissertation are i) to identify the requirements for correct and efficient online instruction-grain monitoring of parallel applications, and ii) design and evaluate the required mechanisms that enable efficient monitoring of parallel applications. As far as the requirements are concerned, the first one states that monitoring must process the application events in the same order that the events followed luring execution. The second requirement states that the monitoring process, being a parallel process itself or performance reasons, has the same requirements with respect to atomicity of shared state.;For the ordering requirement, we propose to record application cache coherence activity in order to infer the order of concurrent application instructions (e.g., inter-thread dependences). The recorded information is then used by the monitoring process to properly direct monitoring tasks. Depending on the memory model deployed in the system, there might be cycles of dependences due to the instruction re-orderings allowed. We study these cycles and present a software-based algorithm that dictates how monitoring can reason about the order of application events under relaxed memory models. Furthermore, we identify for the first time that the relative order of high-level application events (e.g., malloc ( ) ) must also been exposed to the monitoring analysis, in order to identify the outcome of logical races. We extend our order capturing mechanism to send and receive ConflictAlert messages that mark the execution of a high-level event with respect to the execution of other application threads.;For the atomicity requirement, we study a diverse set of monitors and categorize them into two possible groups based on when they update shared state. The first group updates shared state when processing store instructions (apart from high-level events), while the second group may update shared state when processing both loads and store instructions. The first group avoids the use of high-level synchronization primitives (e.g., locks), as the use of ordering information provides the required synchronization effects. For the second group we propose a novel organization, where locks are introduced only on the code responsible for processing load instructions. Furthermore, we study how the memory model of the system affects atomicity properties by delaying store instructions that advertise monitors' progress, and propose a lightweight hardware mechanisms to ensure correct progress updates.;Finally, to enhance the efficiency of our monitoring mechanisms, we study a previously proposed set of monitoring accelerators that were initially targeted on serial monitoring. We identify their limitations and their mismatch with the parallel world, proposing the required set of modifications to support parallel applications.;Overall, we propose a set of software and hardware mechanisms that collectively enable efficient online parallel monitoring of multithreaded applications. We evaluate our mechanisms in the context of a decoupled monitoring architecture (LBA), and report a monitoring speedup ranging from 24X -- 36X for the monitors studied over serial monitoring, and a monitoring speedup of 1.13X -- 10X over unaccelerated parallel monitoring. We further show that cycles of dependences are prevalent but infrequent, allowing for software-based cycle resolution, which has no noticeable effects on monitoring performance. (Abstract shortened by UMI.).
Keywords/Search Tags:Monitoring, Parallel, Mechanisms, Application, Enable, Shared state, Identify
Related items