Compiler-based memory optimizations for high performance computing systems

Posted on:2014-05-15

Degree:Ph.D

Type:Dissertation

University:The Pennsylvania State University

Candidate:Kultursay, Emre

Full Text:PDF

GTID:1458390005495689

Subject:Engineering

Abstract/Summary:

Parallelism has always been the primary method to achieve higher performance. To advance the computational capabilities of state-of-the-art high performance computing systems, we continue to rely on increasing parallelism by putting more processors into systems and integrating more cores and more logic into the processors. However, the returns from increasing parallelism are diminishing. Putting more chips, more cores, more logic already started to bring less and less improvements in performance. The primary cause of this behavior is memory. The scalability problem of memory systems translates into a discrepancy between the increase in processor performance and the improvements in memory bandwidth, latency, and energy efficiency. The memory systems in high performance computers are no longer able to provide data to the parallel computational units at a fast enough rate, with a low enough latency and energy. This memory problem plagues the design of high performance systems, and scaling trends show that managing and accessing memory efficiently is one of the most crucial challenges for realizing exascale systems.;This dissertation focuses on compiler-based methods to address these memory issues. Specifically, various compiler-based memory optimizations to improve the memory behavior of application-specific and general purpose high performance computing systems are proposed and evaluated.;The first part of this dissertation identifies the memory scalability problem in the design of application-specific hardware accelerators and proposes a compiler-based automatic memory partitioning method to address this issue. This method generates energy efficient, high bandwidth, low latency memory systems and enables the generation high performance accelerators that can scale up to a huge number of custom-designed chips.;The second part of this dissertation targets general purpose systems and attacks the memory latency and bandwidth problem in many-core processors that are used as building blocks for general purpose high performance computing systems. It presents compiler generation of software prefetching and streaming store instructions and shows their effectiveness in hiding long memory latencies and saving precious memory bandwidth on a cutting edge many-core processor.

Keywords/Search Tags:

Memory, High performance, Compiler-based, Bandwidth

Related items

1	Compiler and microarchitecture mechanisms for exploiting registers to improve memory performance
2	0.25um High Performance SRAM Design Methodology Using Memory Compiler
3	Scalable Compiler Optimizations for Improving the Memory System Performance in Multi- and Many-core Processors
4	Design And Implementation Of Memory Compiler For Embedded SOC
5	Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
6	Optimization of parallel ray tracing on memory-bandwidth-constrained systems
7	A Design On High Performance DSP Memory Controller
8	High-Performance Sensor Networks Architecture And The Available Bandwidth Estimation
9	An FPGA-based hardware development system with multi-gigabyte memory capacity and high bandwidth
10	Compiler algorithms for efficient use of memory systems