Font Size: a A A

Integrated compiler optimizations for tensor contractions

Posted on:2009-01-02Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Gao, XiaoyangFull Text:PDF
GTID:1448390002498041Subject:Computer Science
Abstract/Summary:
This dissertation addresses several performance optimization issues in the context of the Tensor Contraction Engine (TCE), a domain-specific compiler to synthesize parallel, out-of-core programs for a class of scientific computations encountered in computational chemistry and physics. The domain of our focus is electronic structure calculations, where many computationally intensive components are expressible as a set of tensor contractions. These scientific applications are extremely compute-intensive and consume significant computer resources at national supercomputer centers. The manual development of high-performance parallel programs for them is usually very tedious and time consuming. The TCE system is targeted at reducing the burden on application scientists, by having them specify computations in a high-level form, from which efficient parallel programs are automatically synthesized.; The goal of this research is to develop an optimization framework to derive high-performance implementations for a set of given tensor contractions. In particular, the issues investigated include: (1) Development of an efficient in-memory parallel algorithm for a tensor contraction: A tensor contraction is essentially a generalized matrix multiplication involving multi-dimensional arrays. A novel parallel tensor contraction algorithm is developed by extending Cannon's memory-efficient parallel matrix multiplication algorithm. (2) Design of a performance-model driven framework for a parallel out-of-core tensor contraction: For a parallel out-of-core tensor contraction, besides the in-core parallel algorithm used, several other factors can affect the overall performance, such as the nested-loop structure (permutation), tile size selection, disk I/O placement and the data partitioning pattern. The best choice here depends on the characteristics of the target machine and the input data. We develop performance models for different parallel out-of-core alternatives and use predicted information from these performance models to drive our optimization techniques. (3) Design of an integrated optimization framework for a set of tensor contractions: In order to improve locality and parallelism, many high-level program transformation techniques are incorporated in the TCE framework, such as loop fusion, fission, permutation, and tiling. Given a set of tensor contractions, the number of possible combinations of program transformations to consider is too large for exhaustive enumeration. A primary goal of our research is to: (1) model interactions between different transformations and assess their impact on the overall performance, (2) use a search-based strategy to explore the combined space of transformations, and provide efficient pruning methods to reduce the search space, (3) use performance-driven models to identify the best combination and generate high-performance code, (4) Incorporation of domain-specific optimizations: Tensors in the class of computations considered have a number of symmetry properties, which must be utilized in order to generate efficient code. However, tensor symmetry restricts the applicability and effectiveness of some compiler optimizations. We study the effects of different symmetry properties on loop transformations and code generation, and adapt the optimization approaches to make them applicable to computations on tensors with symmetry properties.; We have implemented and evaluated most of the proposed algorithms and optimizing strategies. Experimental results show that they work effectively in practice.
Keywords/Search Tags:Tensor contraction, Optimization, Compiler, TCE, Performance, Parallel, Algorithm
Related items