Research Of SIMD Vectorization Algorithm And Regrouping Technology

Posted on:2013-04-22

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Wei

Full Text:PDF

GTID:1228330395980634

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the popularity of multimedia applications, more and more processors have beenintegrated with SIMD extensions. As the support for float computation in SIMD extensionsbecome more sophisticated, SIMD extensions are more widely used in high computation field.Different SIMD extensions support various instruction sets, which are close to hardware. So it isdifficult for programmer to write SIMD code by hand, the work has to be realized automatically.Most of the compilers have applied automatic vectorization, such as Gcc, Open64and Icc.Currently there are two mainstream vectorization approaches: traditional vectorization focuson exploiting inter-iteration parallelism, SLP focus on exploiting inner-iteration parallelism.There are both advantages and disadvantages for the two vectorization approaches. Only whenthe advantages of two vectoriation approaches combined, then can loops be vectorized better.Besides, the unaligned/discrete memory access and loops with short dependence distance alsoincur great challenges for vectorization, and there have been no efficient vectorizationapproaches for nested loops. Furthmore, for specific SIMD extensions special vectorizationapproaches and code generation methods are required.This dissertation take CPU SW1600as main platform, bring modified SLP algorithmRLRSLP and ISGSLP based on specific characteristics of SIMD extensions. In order toovercome the disadvantages of vectorization for basic block, add some associate optimizationsfor SLP. An aggressive vectorization scheme is brought out based on the main factors whichinfluence vectorizing loop nest. Permutaion algorithms for SW1600are presented based on thespecific permutation instructions for SW1600.The main contributions are listed as below.1. SLP algorithm is not as efficient as conventional vectorization algorithm when dealingwith some loops due to load redundant elimination. To solve that problem bringforward RLRSLP algorithm based on SLP algorithm which reserve redundant loadstatements. In order to solve the selection problem brought by reserving redundantloads, adopts packing method on the basis of dependence analysis which extendpackset following use-def chains preferentially. Make sure the pack generationaccording to dependence relationship, so there is no need to schedule packs.Furthermore, most SIMD extensions provide perfect vector permutation instructions, sothis dissertation brings out ISGSLP which add isomorphic generation stage to vectorizethe loops with discrete memory accesses.2. Discrete or unaligned memory accesses incur vectorization performance degrade, andSLP can’t vectorize loop perfectly for some special loops such as reduction and loopswith short dependence distance. This dissertation brings out the a mathematic model todescribe array reference and array regroup, in order to do intra-procedural data regroupto avoid incongruous memory reference, it also adopts inter-procedural array padding,loop peeling and optimized code generation to avoid misalignment. As for reduction and loops with short dependence distance, do redundant store elimination and reductiontransform before SLP, and vector operation combination and redundant eliminationafter SLP, as to vectorize loop perfectly.3. The vectorization usually target for innermost loop, there have been no easyvectorization approaches dealing with loop nest. This dissertation brings out anautomatic vectorization approach to vectorize nested loops form outer to inner, firstanalyzes whether the loop can do direct unroll-and-jam through dependency analysis;then collect the values about the loop that will influence vectorization performance,mainly including whether it can do direct unroll-and-jam, the number of arrayreferences that are continuous for this loop index, and the loop region; presents anaggressive algorithm to decide which loops need to do unroll-and-jam, at last generateSIMD code using ISGSLP algorithm.4. Because there exist various vector regroup instructions and the costs of theseinstructions are not equal, it is hard to find an efficient algorithm to realize vectorpermutation. This dissertation analyzes the characteristics of the vector permutationinstructions—shift and insert/extract instructions for SW-1600, designs two algorithmswhich can permutate vectors using least shift or insert/extract instructions, and presentan efficient regroup algorithm using these two types of instructions.

Keywords/Search Tags:

SIMD, RLRSLP, ISGSLP, Loop Nest Vectorization, Associate Optimization, Data Regroup, Vector Permutation

PDF Full Text Request

Related items

1	Research On SIMD Vectorization And Optimization Of Non-Multimedia Applications
2	Research On SIMD Auto-vectorization Optimization Technologies
3	Research Of SIMD Vectorization Optimization Based On Memory Access
4	Research On SIMD Vectorization Of Loop Nests And Its Optimization Techniques
5	Automatic Generation And Optimization Of Data Permutation Instructions For Simd Devices
6	Automatic Generation And Optimization Of Data Permutation Instructions For SIMD Devices
7	Research On Profile-Guided SIMD Vectorization Identification And Optimization
8	Research On Vectorization Method For SIMD Super-long Vector Acceleration Components
9	Implementation And Optimization Of Mixed-radix FFT On Sunway Platform
10	Research On Automatic SIMD Vectorization Recognization And Code Tuning Technology