Font Size: a A A

Register pressure guided loop optimization

Posted on:2008-04-21Degree:Ph.DType:Dissertation
University:Michigan Technological UniversityCandidate:Ma, YinFull Text:PDF
GTID:1448390005475434Subject:Computer Science
Abstract/Summary:
Digital Signal Processing(DSP) processors are a type of processor used for processing digital signals that are utilized in a very broad field. However, uncarefully designed loop optimizations implemented in an optimizing compiler for DSP processors cannot always deliver performance gain. Some reasons include causing too much register pressure or adding too much communication between register files to transfer values, called inter-cluster communication.;To control register pressure, predicting the register requirement before applying loop optimization can effectively prevent performance degradation. In this dissertation, we focus on two essential loop optimizations: scalar replacement and unroll-and-jam. We present two low cost register prediction methods for those loops in a high level representation with the consideration of other loop optimizations and general scalar optimizations before applying them. For unroll-and-jam, a performance model is also described to utilize prediction results to determine the unroll vector automatically from a given unroll space for achieving the best run-time performance.;Our prediction algorithm for scalar replacement predicts the floating-point register pressure of a loop within 2 registers and the integer register pressure within 2.7 registers on average with a time complexity of O( n2) in practice where n is the number of nodes in the data dependence graph used. This algorithm achieves similar performance to the best previous approach, having O(n 3) complexity. For the prediction algorithm for unroll-and-jam, our experiments show that it predicts the floating point register pressure within 3 registers and the integer register pressure within 4 registers. With this algorithm, for 92% of the test loops in our test suite, the performance model can pick the unroll vectors that achieve the best loop performance or performance close to the best. Also for the Polyhedron benchmark, our register pressure guided unroll-and-jam improves the overall performance about 2% over the model in the industry-leading optimizing Open64 backend on both 32bit and 64bit model for x86 and x86-64 architectures.;For inter-cluster communications, in this dissertation, a fusion algorithm is presented to consider the impacts from unroll-and-jam and scalar replacement and other optimizations for clustered VLIW architectures in order to provide the best overall performance as well as the minimum additional inter-cluster communications. In the experiments, this fusion algorithm applied with unroll-and-jam and scalar replacement speeds up all test loops from a factor of average 1.57 to 1.69, compared with the results by the similar optimizations but without fusion.;With the register pressure prediction algorithms and the demonstration of register pressure guided loop optimization, our research opens the door to completely eliminate the performance degradation of loop optimizations due to register pressure in the future. Loop fusion considering unroll-and-jam also helps a compiler to get better performance on a clustered VLIW architecture with a partitioned register bank.
Keywords/Search Tags:Register, Loop, Performance, Unroll-and-jam, Scalar replacement
Related items