Analyzing and Accelerating Runtime Systems on Multicore Architecture

Posted on:2014-12-25

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Tiwari, Devesh

Full Text:PDF

GTID:1458390005995621

Subject:Engineering

Abstract/Summary:

Technology scaling has made multicore architectures commercially prevalent. However, exploiting multicore parallelism for performance remains challenging for programmers, because of side-effects of parallel programming such as concurrency management, data-races, deadlocks etc. Therefore, there is a need for solutions that can exploit the computational power of multicore platform without burdening the programmers with concurrency management. In this dissertation, we use two specific runtime systems (dynamic memory management runtime system and shared memory MapReduce runtime system) as a vehicle to demonstrate that runtime systems, that can exploit the multicore parallelism transparently, can be part of the solution.;First, we show how to reduce the overhead of dynamic memory management runtime system by parallelizing it on a multicore architecture. However, traditionally dynamic memory management runtime systems execute sequentially and hence, cannot take advantage of multicore platform. Moreover, tasks such as malloc and free are often very small, and hence, executing them in parallel on a separate core may even degrade the performance due to high communication and synchronization cost. We use dynamic memory management runtime system as an example to show how to efficiently exploit the fine-grained parallelism in dynamic memory management. We also show the design of such a system that exploits the fine-grained parallelism in the runtime library while remaining transparent to the application and memory allocation library without modifying any of those.;Second, we focus on analyzing and optimizing shared memory MapReduce runtime system. Shared memory MapReduce runtime systems allow programmers to express parallelism at a higher level, and provide automatic management of concurrency. However, due to high level of abstraction programmers are often not aware of performance bottlenecks of such runtime systems. Hence, they may achieve only suboptimal performance gains. To address this, we build a new analytical model to analyze key performance factors of shared memory MapReduce runtime libraries and discover several previously unknown and non-intuitive performance trends. Findings and insights from our analytical model can help both programmers and system designers in understanding and explaining the performance bottlenecks of these runtime systems.;Finally, we optimize the shared memory MapReduce runtime system design for the cases where programs are often run multiple times with either identical or slightly-changed input to exploit the significant opportunity for computation reuse in such cases. We propose a novel technique for computation reuse in shared memory MapReduce runtime systems, which we refer to as MapReuse . MapReuse detects input similarity by comparing their signatures. It skips re-computing output from a repeated portion of the input, computes output from a new portion of input, and removes output that corresponds to a deleted portion of the input. We show that MapReuse achieves significant performance improvement in different scenarios, leaving the underlying shared memory MapReduce largely un-modified.

Keywords/Search Tags:

Runtime systems, Shared memory mapreduce, Multicore, Performance, Programmers, Parallelism, Input, Exploit

Related items

1	A Deterministic And Scalable MapReduce For Multicore Systems
2	Research On Technology Of Managing Shared Memory In Multicore Operating Systems
3	Effective performance analysis and optimizations for memory intensive programs on multicore
4	Research On Key Technologies In Scalable Shared-Memory Systems
5	Research On BGP Parallelism Technologies For Multicore And Multi-threading
6	Research On Communication Techniques Based On Shared Memory And Networks-on-chip On Multicore Architectures
7	On Performance Optimization And Evaluation For Multicore Memory Systems
8	Runtime Support For Maximizing Performance on Multicore Systems
9	Research Of Memory-Access Management And Scheduling Optimization In Functional Parallelism
10	Research On The System-Level Optimizing Key Techniques For MPI Communication On Multicore Systems