Support for dynamic management of parallelism in chip multiprocessors

Posted on:2009-08-08

Degree:Ph.D

Type:Dissertation

University:Princeton University

Candidate:Contreras Salas, Gilberto

Full Text:PDF

GTID:1448390002994486

Subject:Engineering

Abstract/Summary:

In recent years, the microprocessor industry has been revolutionized by the introduction of the chip multiprocessor (CMP). Created as an alternative to single-core designs, CMPs promise to mitigate two of the most serious challenges of modern high-performance singlecore processors: design complexity and power consumption [63][65][75].;Workloads that rely on throughput are likely to benefit from CMP architectures with modest effort. However, extending the performance potential of CMPs broadly to sequential applications remains a difficult problem. Conventional compiler approaches have largely failed to extract sufficient thread-level parallelism from single-threaded applications to take advantage of many cores [80][90], leaving it to the programmer to extract cost-effective parallelism.;With the purpose of creating easy-to-use tools for the development of parallel applications, industry and academia have developed parallel runtime systems and libraries that allow programmers to focus their efforts on the identification of parallelism rather than worrying about how parallelism is managed and/or mapped to the underlying architecture [15][38][39][42][56][76][81]. Dynamic management of parallelism, or the ability to take created parallelism and dynamically assign it to available execution resources, is currently used by many runtime libraries such as OpenMP and the Intel Threading Building Blocks runtime library to provide improved performance. While parallel runtime libraries make it easier for programmers to develop parallel code, software-based dynamic management of parallelism inflicts a performance cost on parallel applications as the runtime library is called to make runtime decisions. For aggressively-annotated parallel code, usage of software-based runtime libraries implies the possibility of exposing software management overheads, which at significant levels can render the existing parallelization approach cost-ineffective. Moreover, with parallelism management cost increasing with increasing core counts, performance portability of applications across large core counts is severely affected.;This dissertation proposes a low-overhead, low-latency dynamic parallelism management solution aimed at improving parallelism performance. The proposed solution not only allows parallel applications to make effective use of large core counts, but it also allows them to gracefully adapt to dynamic changes in system characteristics such as core-speed and core-count variations. To this end, this work sets forth four overarching goals: (1) perform an in-depth characterization of two popular parallel runtime libraries with the goal of identifying some of the benefits and shortcomings in their dynamic management of parallelism; (2) provide a detailed study of how software-based approaches are able to, or fail to, mitigate performance heterogeneity caused by technology variations; (3) develop parallelism redistribution policies that utilize global information with the aim of improving load balancing and performance scalability; and (4) describe Squadron, a comprehensive framework aimed at providing superior performance through low-overhead, low-latency dynamic management of parallelism capable of achieving performance improvements ranging from 18% to 13X over existing software-based solutions.;The end result of this dissertation is a detailed study of dynamic management of parallelism in software, as well as its performance potential under hardware support. The characterization results presented in this work can help runtime system designers create more optimal designs by offering valuable insights into some of the major sources of overheads currently limiting the scalability of software solutions. Squadron serves as the first step in the development of an attractive solution for future CMP architectures looking to offer superior parallelism performance through specialized hardware support.

Keywords/Search Tags:

Parallelism, Dynamic management, CMP, Performance, Support, Runtime libraries

Related items

1	Efficient Runtime Support for Reliable and Scalable Parallelism
2	Analyzing and Accelerating Runtime Systems on Multicore Architecture
3	Exploiting Parallelism in Multicore Processors through Dynamic Optimizations
4	RCC: A compiler for the R language for statistical computing
5	Runtime resource management in concurrent systems
6	Runtime Support For Maximizing Performance on Multicore Systems
7	Runtime Optimization For Large-Scale Neural-Network Data-Parallelism Training
8	Runtime support for effective memory management in large-scale applications
9	A study of the job training needs of the support staff in the six Kansas Board of Regents university libraries
10	Research On Multi-core/Many-core Platform Oriented Speculative Parallelizing Technology