Font Size: a A A

Optimizing locality and parallelism through program reorganization

Posted on:2009-02-04Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Krishnamoorthy, SriramFull Text:PDF
GTID:1448390002494394Subject:Computer Science
Abstract/Summary:
Development of scalable application codes requires an understanding and exploitation of the locality and parallelism in the computation. This is typically achieved through optimizations by the programmer to match the application characteristics to the architectural features exposed by the parallel programming model. Partitioned address space programming models such as MPI foist a process-centric view of the parallel system, increasing the complexity of parallel programming. Typical global address space models provide a shared memory view that greatly simplifies programming. But the simplified models abstract away the locality information, precluding optimized implementations. In this work, we present techniques to reorganize program execution to optimize locality and parallelism, with little effort from the programmer.; For regular loop-based programs operating on dense multi-dimensional arrays, we propose an automatic parallelization technique that attempts to determine a parallel schedule in which all processes can start execution in parallel. When the concurrent tiled iteration space inhibits such execution, we present techniques to re-enable it. This is an alternative to incurring the pipelined startup overhead in schedules generated by prevalent approaches.; For less structured programs, we propose a programming model that exposes multiple levels abstraction to the programmer. These abstractions enable quick prototyping coupled with incremental optimizations. The data abstraction provides a global view of distributed data organized as blocks. A block is a subset of data stored contiguously in a single process' address space. The computation is specified as a collection of tasks operating on the data blocks, with parallelism and dependence being specified between them. When the blocking of the data does not match the required access pattern in the computation, the data needs to be reblocked to improve spatial locality. We develop efficient data layout transformation mechanisms for blocked multi-dimensional arrays. We also present mechanisms for automatic management of load balance, disk I/O, and inter-process communication on computations expressed as sets of independent tasks on blocked data stored on disk.
Keywords/Search Tags:Locality and parallelism, Data, Computation
Related items