Transforming complex loop nests for locality

Posted on:2003-09-30

Degree:Ph.D

Type:Thesis

University:Rice University

Candidate:Yi, Qing

Full Text:PDF

GTID:2468390011479082

Subject:Computer Science

Abstract/Summary:

Over the past 20 years, increases in processor speed have dramatically outstripped performance increases for standard memory chips. To bridge this gap, compilers must optimize applications so that data fetched into caches are reused before being displaced. Existing compiler techniques can efficiently optimize simple loop structures such as sequences of perfectly nested loops. However, on more complicated structures, existing techniques are either ineffective or require too much computation time to be practical for a commercial compiler.; This thesis develops the following novel techniques to optimize complex loop structures both effectively and inexpensively for better locality.; Extended dependence representation: a matrix representation that incorporates dependence relations between iterations of arbitrarily nested loops.; Transitive dependence analysis algorithm: a new algorithm that improves the time complexity of existing transitive dependence analysis algorithms.; Dependence hoisting: a new loop transformation technique that permits the direct fusion and interchange of arbitrarily nested loops. The transformation is inexpensive and can be incorporated into most commercial compilers.; Computation slicing: a framework that systematically applies dependence hoisting to optimize arbitrary loop structures for better locality.; Recursion transformation: the first compiler work that automatically transforms loop structures into recursive form to exploit locality simultaneously at multiple levels of the memory hierarchy.; Both the computation slicing framework and recursion transformation have been implemented and applied to successfully optimize a collection of benchmarks. In particular, the slicing framework has successfully blocked four linear algebra kernels: Cholesky, QR, LU factorization without pivoting, and LU with partial pivoting. The auto-blocked versions have achieved performance improvements similar to those attained by manually blocked programs in LAPACK [7]. The automatic blocking of QR and pivoting LU is a notable achievement because these kernels include loop nests that are considered difficult—to our knowledge, few previous compiler implementations have completely automated the blocking of the loop nests in these kernels. These facts indicate that although with a cost much lower than that of existing more general transformation frameworks [34, 42, 2, 36, 49], the computation slicing framework can in practice match or exceed the effectiveness of these general frameworks.

Keywords/Search Tags:

Loop, Slicing framework, Computation slicing, Locality

Related items

1	Research On Theory And Application Of Program Slicing
2	Research And Implementation Of Slicing Algorithms In Rapid Prototyping
3	Implementation Of Static Slicing Tools For JavaScript Based On WALA
4	Research On The Slicing Algorithm And Its Realiation Based On AMF
5	Study On The Technology Of Triangulated Model Reconstruction Based On Slicing Data
6	The Research Of Coarse-Grained Object-Level Slicing Method
7	Research On Intelligent Configuration And Resource Scheduling For Network Slicing
8	Designing And Researching The Localization Method For Refactoring Based On Slicing Metrics
9	Resource Mapping For Sliced Mobile Communication Network
10	A Study Of Optimization Of Slicing Direction And Slicing Algorithm In 3D Printing