Font Size: a A A

'C-Level' Programming of Parallel Coprocessor Accelerators

Posted on:2011-05-11Degree:Ph.DType:Dissertation
University:University of WashingtonCandidate:Ylvisaker, BenjaminFull Text:PDF
GTID:1448390002461420Subject:Engineering
Abstract/Summary:
We believe that FPGA-like parallel coprocessor accelerators can be programmed efficiently at the "C level" of abstraction. In order to support this claim we define an abstract architectural model of accelerators that conveys the kind of high-level behavior and performance characteristics that the von Neumann model conveys to programmers of conventional processors. Using the model as a guide we define a programming language and compilation strategy that: 1. do not impose programming style restrictions that are not inherent in the model, 2. do not introduce serious inefficiencies, and 3. are performance portable across implementations of the model.;In this dissertation I describe C-level programming of accelerators broadly, and make three particular contributions to the programmability of accelerators.;Enhanced loop flattening is a new method for translating loop nests with arbitrary static control flow into a form that can be efficiently pipelined with conventional algorithms designed for simple loops. This method advances the goal of supporting a wide set of programming styles with reasonable efficiency.;Parallel accelerators have statically managed resources---like local memories---that vary widely in capacity from one implementation to the next. In order to get close to peak performance, applications must be tuned to the specific resources available in a given implementation, and empirical auto-tuning is an attractive way to do that. I propose and evaluate a new probabilistic auto-tuning method that elegantly handles situation where many possible configurations of the application fail to work at all because they exceed some architectural resource limit.;For many applications, achieving good performance on parallel accelerators requires deep loop pipelining, which requires dramatically reordering the individual operations in the application. Local dependencies between operations can be respected by compilers relatively easily, but non-local dependencies force implementations to choose between conservatively not reordering operations (which might kill performance), proving that reordering preserves the meaning of the program (which is impossible in the general case), or making unsound transformations (which programmers generally dislike). I propose a mostly sequential operational semantics for C-level streaming languages targeted at parallel accelerators that offers enough flexibility to the implementation to achieve good performance, deviates from conventional program-order semantics in fairly modest and understandable ways, and provides tools with which the programmer can control the reordering performed by the implementation.;These innovations are evaluated in the context of Macah, a new C-like language developed in the Mosaic group at the University of Washington. For validation we use a number of compute-intensive benchmarks developed by members of the Mosaic group and other contributors.
Keywords/Search Tags:Accelerators, Parallel, Programming
Related items