Key Compilation Techniques For High Productivity Computing: Precision, Performance And Power Consumption

Posted on:2008-06-30

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Q Yang

Full Text:PDF

GTID:1118360242999344

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of computer science technology, the high productivity is presented to solve many problems currently existed in the filed of high performance. These problems mainly include the realistic performance, easily parallel programming and lower cost, portability and robustness. On the other hand, the round-off errors more and more affect the precision of the larger scale numerical simulation of safety-critical applications. Thus, the high productivity should also consider the high precision floating-point arithmetic with binary larger than 64-bit. In fact, these five problems interact with each other. For example, the high precision floating-point arithmetic operations will improve the confidence of numerical simulation, but it will also require more optimized performance and lower cost.Compilation technology basically supports the high productivity. This paper considers three basic problems for the compilation such as the high precision floating-point arithmetic operations, the optimization techniques for the high precision operations and the compilation techniques for low-power. This paper gains five innovations as follows.1. With the supports of double-extended precision floating-point arithmetic in hardware, we designed and implemented a FORTRAN compiler named by CCRG. The compiler supports the double-extended precision floating-point arithmetic for which commercial compilers can not support. CCRG can effectively improve the floating-point precision for the scientific computing. The precision-sensitive BBP algorithm for the computation of Pi is used to verify the CCRG compiler.2. We presented an inline algorithm for the exponent function by combining the well-known table-driven algorithm and the polynomial parallel computing. This new algorithm can also be generalized for the inline of other transcendental functions. We designed and implemented function inline algorithms, such as power, division, square root and exponent function. These algorithms not only effectively reduce the overheads for function calls, but also induce other compilation optimizations. Typical experimental results show that the inline mathematic library functions have improved the performance of double-extended precision floating-point operations by 17.8%.3. Using the reference information for arrays from the front-end of the compiler, we presented an algorithm for the dependency analysis of affine array indices by splitting the Chains of Recurrences. We also improved the algorithm for dependency analysis of non-affine array indices. Both algorithms improve the ability of dependency analysis for the access of linearized array and enable the CCRG compiler more effective for the loop transformation and the data locality.4. We presented and implemented various optimization algorithms for compilation oriented to the instruction-level parallelism. These algorithms include the alias analysis, automatic inline functions, data prefetching, post-increment loads and stores and so on. These algorithms can effectively alleviate the memory stalls in high-precision arithmetic. Experimental results show that these algorithms can improve the performance of double-extended precision floating-point operations by 42.0%. This result is superior to the GCC compiler by 66.7% and is 70.7 of the Intel double-precision commercial compiler and is fast that the Intel quad-precision by 15.8 times.5. We presented the low-power optimization techniques for the MPI_Barrier and the low-energy optimization techniques for the MPI_Reduce and MPI_Bcast for MPI implementation of double-extended precision floating-point arithmetic. The benchmarks using NPB3.2-MPI of CLASS C show that the maximal energy reduction can scale up to 19.2% for MPI_Barrier and the average reduction can scale up to 5.2%. The energy of MG3D program can be reduced by 17.7% and 14.2% respectively while using MPI_Reduce and MPI_Bcast.

Keywords/Search Tags:

High Productivity Computing, Double-Extended Precision Floating-Point Arithmetic, IA-64 Architecture, Compilation Optimization, Low-Power Optimization

PDF Full Text Request

Related items

1	The Architecture And Implementation Of Arithmetic Clusters Based On Stream Applications
2	The Key Technology Study For Super Precision Floating-Point Arithmetic
3	Key Technologies Of VLSI Implementation Of High Performance Floating-point Arithmetic Unit
4	Applications Of Arbitrary Precision Floating-Point Arithmetic In Delaunay Mesh Generation
5	The Research And Implementation Of High Performance SIMD Floating-point Multiplication Accumulator Unit For FT-XDSP
6	Design Of Double Precision Floating Point Unit
7	Low power synchronous design of hardware architecture for IEEE 754 single precision floating point fast fourier transform
8	Double-precision Floating-point Matrix Computing Unit Design Based On Fpga
9	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
10	High-performance Floating-Point Unit Design