Font Size: a A A

Programming model and execution model for OpenMP on the Cyclops-64 manycore processor

Posted on:2011-09-16Degree:Ph.DType:Thesis
University:University of DelawareCandidate:Gan, GeFull Text:PDF
GTID:2448390002964325Subject:Engineering
Abstract/Summary:
As we all know, it is very difficult to program a chip with many processing cores, especially if the chip has user-managed memory hierarchies, like the IBM Cyclops-64 many-core processor. Without considering the heterogeneity of the memory space, Cyclops-64 is similar to the traditional shared memory SMP machine. For this kind parallel machine, OpenMP [5] is the dominate programming language. Although OpenMP provides abundant directives that programmers can use them to decompose loops in a sequential program and make it a parallel program, it provides little support to help programmers to deal with the segmented memory space. Therefore, significant problems will arise if an OpenMP programmer wants to develop OpenMP programs on the Cyclops-64 processor. For example, the problem of manually recoding the original program to add in data movement code; the problem of overlapping the execution of data movement code and computation code may arise. This motivates us to develop a series of tile aware parallelization techniques to attack these problems. The basic idea is to enhance the OpenMP API with the concept of data tile so programmers can use the extended OpenMP API to annotate their programs and tell a compiler what is the shape of the data tile and how it would be used in the program, or where the data tiles are located etc. The purpose is to expose more information about program data and their usage so a compiler can have more opportunities to perform some aggressive optimizations that would not be possible (or inefficient, or inaccurate) if without such hints from the programmers.;The major contributions of this thesis are: (1) In this thesis, we introduce the concept of tile aware parallelization, an extension to the current OpenMP. We analyze and discuss some problems that OpenMP programmers would come across on the Cyclops-64 processor. Then, we use some motivating examples to demonstrate why tile aware parallelization techniques are necessary and also possible to solve these problems. As far as the authors are aware, we are the first that propose tile aware parallelization for the OpenMP programming language. (2) The thesis proposes and develops tile percolation, an OpenMP tile aware parallelization technique that can be used to generate data percolation code for OpenMP programs running on the Cyclops-64 processor. The thesis provides an exploration of the necessity and possibility of developing pragma directives for semi-automatic data movement code generation in OpenMP. The thesis also introduces the techniques used to implement tile percolation, which includes the new programming API, code generation, and the required runtime support. Evaluation results show that tile percolation can make the OpenMP programs run on the Cyclops-64 chip much more efficiently. (3) To improve the tile percolation technique, we have designed and developed the Thread-Level Decoupled Access/Execution (TL-DAE for short) model for OpenMP programs running on the Cyclops-64 chip. We have designed the TL-DAE programming interfaces that can be used to help OpenMP compiler to generate decoupled code. We have also developed the runtime support that is needed to support the TL-DAE execution model. The experimental results demonstrate the effectiveness of the TL-DAE execution model. (4) We have proposed and developed an OpenMP tile aware parallelization technique called tile reduction. It can apply parallel reduction on multi-dimensional arrays. We discuss the methods used to implement tile reduction, including the required OpenMP API extension and the associated code generation technique. We evaluate the tile reduction technique with a set of benchmarks. The experimental results show that using tile reduction can make the code parallelization more natural and flexible. It not only can expose more parallelism in the program but also can improve its data locality. (Abstract shortened by UMI.)...
Keywords/Search Tags:Program, Openmp, Cyclops-64, Execution model, Tile aware parallelization, Data, Processor, Chip
Related items