Font Size: a A A

Compiler-based memory optimizations for high performance computing systems

Posted on:2014-05-15Degree:Ph.DType:Dissertation
University:The Pennsylvania State UniversityCandidate:Kultursay, EmreFull Text:PDF
GTID:1458390005495689Subject:Engineering
Abstract/Summary:
Parallelism has always been the primary method to achieve higher performance. To advance the computational capabilities of state-of-the-art high performance computing systems, we continue to rely on increasing parallelism by putting more processors into systems and integrating more cores and more logic into the processors. However, the returns from increasing parallelism are diminishing. Putting more chips, more cores, more logic already started to bring less and less improvements in performance. The primary cause of this behavior is memory. The scalability problem of memory systems translates into a discrepancy between the increase in processor performance and the improvements in memory bandwidth, latency, and energy efficiency. The memory systems in high performance computers are no longer able to provide data to the parallel computational units at a fast enough rate, with a low enough latency and energy. This memory problem plagues the design of high performance systems, and scaling trends show that managing and accessing memory efficiently is one of the most crucial challenges for realizing exascale systems.;This dissertation focuses on compiler-based methods to address these memory issues. Specifically, various compiler-based memory optimizations to improve the memory behavior of application-specific and general purpose high performance computing systems are proposed and evaluated.;The first part of this dissertation identifies the memory scalability problem in the design of application-specific hardware accelerators and proposes a compiler-based automatic memory partitioning method to address this issue. This method generates energy efficient, high bandwidth, low latency memory systems and enables the generation high performance accelerators that can scale up to a huge number of custom-designed chips.;The second part of this dissertation targets general purpose systems and attacks the memory latency and bandwidth problem in many-core processors that are used as building blocks for general purpose high performance computing systems. It presents compiler generation of software prefetching and streaming store instructions and shows their effectiveness in hiding long memory latencies and saving precious memory bandwidth on a cutting edge many-core processor.
Keywords/Search Tags:Memory, High performance, Compiler-based, Bandwidth
Related items