Font Size: a A A

Automatic Library Generation and Performance Tuning for Modular Polynomial Multiplication

Posted on:2016-03-23Degree:Ph.DType:Thesis
University:Drexel UniversityCandidate:Meng, LingchuanFull Text:PDF
GTID:2478390017483913Subject:Computer Science
Abstract/Summary:
Polynomial multiplication is a key algorithm underlying computer algebra systems (CAS) and its efficient implementation is crucial for the performance of CAS. In this context coefficients of polynomials come from domains such as the integers, rationals and finite fields where arithmetic is performed exactly without rounding error.;Obtaining peak performance on modern computer architectures is difficult due to the complexity of the memory hierarchy, and the need for efficient parallelism on multiple levels ranging from instruction level parallelism (ILP) and vector parallelism to multi-core parallelism. Tuning the performance of code to fully utilize all of these components requires modifying the implementation to the particular architecture and extensive experimentation to determine the best choices of algorithm and implementation strategies. Moreover, this time consuming process must be repeated whenever the architecture changes, as the particular choices for one machine are not optimal for another, even when relatively minor changes are made in the architecture.;In numeric (floating point based) scientific and mathematical libraries where performance is crucial, considerable effort has been made to optimize key routines across multiple architectures. This performance tuning has been aided by the use of automation where many code choices are generated and intelligent search is utilized to find the "best" implementation on a given architecture. The performance of autotuned implementations is comparable to, and in some cases better than, the best hand-tuned code.;In this thesis we design and implement algorithms for polynomial multiplication using a fast Fourier transform (FFT) based approach. We improve on the state-of-the-art in both theoretical and practical performance. The SPIRAL library generation system is used to automatically generate and tune the performance of a polynomial multiplication library that is optimized for memory hierarchy, vectorization and multi-threading, using new and existing algorithms.;Many implementations of FFT-based polynomial multiplication use a power of two FFT. Inputs are zero-padded to the smallest power of two greater than the output size, and the resulting performance exhibits a "staircase" phenomenon where the computing time for problem sizes between powers of two is essentially equal to the time for using the larger power-of-two transform.;New algorithms for the truncated Fourier transform (TFT) and the inverse (ITFT) are presented that smooth the staircase phenomenon by exploiting the fact that some of the inputs are zero and not all of the outputs are needed. The new general-radix algorithms presented in thesis have the same asymptotic complexity of the fastest existing approaches, but extend the search space so that implementations can best adapt to the memory hierarchy. The new parallel algorithms introduce a small relaxation for larger problem sizes which trades off slightly higher arithmetic cost for improved data flow which allows full vectorization and parallelization.;The SPIRAL library generator is extended and enhanced to completely automate library implementation and optimization for polynomial multiplication, including the modular discrete Fourier transform, truncated Fourier transform, convolution and polynomial multiplication. The input to the generator is a specification of recursive algorithms for polynomial multiplication expressed in a high-level domain-specific language; the output is a vectorized and multi-threaded C++ library that supports general input size. The resulting libraries smooth out the staircase phenomenon while retaining the performance associated with state-of-the-art power of two libraries.
Keywords/Search Tags:Performance, Polynomial multiplication, Library, Implementation, Tuning, Fourier transform
Related items