Font Size: a A A

Automatic Generation And Optimization Of Data Permutation Instructions For SIMD Devices

Posted on:2011-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2178330338989963Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To improve the performance of computation intensive applications, such as multimedia, encoding/decoding, encryption/decryption, etc., almost all current modern microprocessor integrated SIMD (Single Instruction Multiple Data) functional units, and extended baseline instruction set to support SIMD instructions to fully exploit the enounours data level parallelism contained in these kinds of applications.Ideally, these extended SIMD instruction set can lead to high performance speedup, but current compilers cannot take full advantage of these SIMD instructions to achieve satisfactory performance because of following two reasons: (1) all SIMD instructions are register-to-register type, therefore, the width of operands must be similar to that of vector registers; (2) most SIMD memory access unit supports only contiguous, aligned memory access. For applications can not meet above two requirements, extra data permutation instructions should be inserted to generated operands for SIMD instructions. But unfortunately, these data permutation instructions have become the main source of performance loss in current SIMD programming.Based on the analysis of current compilation strategies for data permutation generation and optimization, it can be easily found that in most mordern compilers they are implemented in two passes. In the first one, permutation instructions are generated, but many of them are redundant. Then in the second one, these redundant permutation instructions are eliminated as possible. However, the result of optimization is usually unsatisfactory.To solve this problem, this paper proposes a new intermediate representation URSS (Unified Representation for Scalar and SIMD). Both scalar and SIMD instructions can be represented with it. Based on URSS, an automatic vectorization algorithm is designed and impelemented, in which only some non-redundant permutation instructions are generated, while other permutations are represented by the conflict edges in data flow graphs. Then, an algorithm for conflict edge identification and optimization is proposed. Above works have been implemented in SUIF2 compiler framework. Experimental results oriented to kernel and MiBench benchmark indicate the correctness and effectiveness of our works.
Keywords/Search Tags:SIMD, Intermediate Representation, Data Permutation, Conflict Edge, compilation optimization
PDF Full Text Request
Related items