Font Size: a A A

Research On Automatic SIMD Vectorization Recognization And Code Tuning Technology

Posted on:2013-08-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:1228330395980635Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Multimedia applications usually have intensive calculation and parallel high features. Usingthe SIMD function unit and SIMD instruction set extensions, the research can enhance the speedof the multimedia program. With the complexity of multimedia programs increasingly, the SIMDmultimedia instruction set has been great expanded. And the use of SIMD extensions applicationperformance acceleration has been out of the traditional areas of multimedia, and matures into forscientific computing. Although many commercial compiler can automatically vectorizationprogram, but the code performance which generated is generally not high. There are still a lot ofdifficulties for the SIMD compiler optimization, such as reference to the structure and pointer, thedata rely on judgment and control dependent conversion seriously hindered the excavationscarried out to quantify the compiler vectorization. In order to improve the SIMD vectorizationidentification and to further explore and optimize the vectorization parallelism in the application,the thesis have developed a source to source automatic vectorization tool SW-VEC for thedomestic high-performance multi-core processors SW1600SIMD multimedia extensions. Thetool can apply serial source code analysis, optimization and reconstruction to explore the SIMDdata-parallel application and generated efficient SIMD source automatically to meet thecharacteristics of multi-core processors SW1600structure.This dissertation studies the keytechnology in building vectorization tool which the impact the vectorization identification andperformance optimization, and mainly focus on pre-optimized to vectorizaton, the SIMDdependency analysis, the SIMD parallizaton exploriton and performance optimization in variousstages of vectorization. And for the need of our project, the thesis constructed the interactionSIMD vetorization code performance tuning framework, which involved the interface design, theacquisition of feedback tuning information and the realization of compiler directive statement.The dissertation innovation is reflected in the following four aspects:1、Discussed the continuity and alignment analysis and optimization methods in thepre-optimized stage and proposed analysis and optimization methods for the continuity andalignment for structures and pointers. The existing SIMD vectorizaiton methods for the structureand pointer structures are mainly local or global array restructuring strategy, which define a newstorage structure and arrange structure or pointer to the new storage space to change its originalstorage layout structure. This approach has brought additional space and time overhead, impactgenerated vectorizaiton code performance. To this end, the dissertation proposed the method ofrearrangement of the structural members. Accoding to judge the existence of the isomorphismstatements in structure member references, the methos adjust and fill the order of structuremembers, to achieve the structure continuous and alignment analysis of transformation. For thepointer structure in the program, the thesis proposed the method of tracking and recording of thepointer reference to determine the continuous and aligned relationship. The experimental resultsshow that the structure member rearrangement and pointer for alignment analysis andoptimization method can effectively improve vectorizaton recognition rate and the performanceof generated vectorization code. Compared with local data restructuring optimization, for the core of the test program code, the structure member rearrangement optimization method get theperformance speedup from the original negative speedup increased to more than300%. Compareto the Intel compiler which appears unable to vectorize pointer code in the test program, thepointer continuity and alignment analysis optimization realized the SIMD vectorizationrecognition for pointer structure and get7%to43%speedup.2、In the SIMD data dependence anaylise, the thesis propsed the formal methods based ondata dependent distance and vectorization fator, And on this basis, the thesis propsed theanti-dependency elimanatin algorithm for the determaination ring and realized the loopdistrubition. For the control flow structure in the loop, the thesis proposed the SIMDvectorization method base on the control dependence graph. The method creates theimplementation of the array of variables according to the extended control dependence graph, andsaves the computation in the conditional expression. Using this method, not only the thesis can doSIMD automatic vectorization for the variable array assignment, but also the thesis can domulti-version vectorization for the comparation for the variable array. This method made theimpact which caused by the uncertainty of variable array to a minimum and extended the range ofvectorization recognition rate, and effectively improve the performance of the program. Theexperimental results show that the proposed SIMD dependency analysis and the optimazation andtransformation for the control flow structure analysis can effectively improve the vectorizationrecognition rate and program performance. Compared with the Intel11.0version compiler, thespeedup can improve by about35%for SW-VEC and the average speedup increases by about21%. Compared to the acceleration which not use data dependence and control dependenceanalysis and optimization proposed in the atical, SW-VEC can improve more than the highest ofabout30%and average speedup increase of about17%.3、 In the exploration of SIMD parallellism and the performance optimization tovectorization code, according to the different stages in the exploration of SIMD vectorizationparallellism, the thesis proposed the cost-benefit calculation method for loop transformation andbasic block SLP parallelism exploration and guided the choice of different options for SIMDvectorizaton exploration and performance optimization in the two-stage transformation. Aimed tothe generated vectorized code, the thesis propsed optimization method for the vector registerreuse through loop interchange and loop unrolling. The former can explore the independencybetween vector data reference and the loop index, and extract loop-independent vector calculationin the inner to the outer loop layer by loop interchange, which eliminated redundant vector loadand computing operations in the inner loop layer and improved vector register reuse. The lattercan examine and compare the relationship between loop dependence distance and thevectorazation factor and implement optimization for the whole or part of vector register reusewhen there is loop-carried flow dependence or output dependence or input dependent. In addition,to exploration the parallelism between the SIMD functional units and the scalar functional units,the thesis propose a method of mixturation of vector and scalar parallellism. This methodunrolled the loop by segmentation and changed the order of execution of the statement in the loop,which depart the statement in the loop into vectorization part and scalar part. If there are no dependencies between the two parts, these two types of statements can be executed in parallel inthe SIMD vector function units and scalar functional units, which can improve the utilization ofsystem resources. The experimental results show that the vector register reuse optimizationcombined with incoming-cost analysis can improve the performance of vecorization code. Usingthe loop segmentation unrolling algorithm, the thesis realized the vector and scalar mixingparallellzation, which effectively improve performance of the generated code. The averagespeedup is increased by about12%.4、The thesis constructed the interaction-based framework of vectorizaton code performancetuning. The framework combines three parts of the vectorization tuning window interface, staticand feedback analysis and tuning for SIMD vectorization recognition and insertion the pragmastatement. Through the framework of vectorizaton code performance tuning, the thesis can getorganic combination of generation vectorized code by static analysis and performance tuning bydynamic feedback in the vectorization code tuning window interface, in conjunction with acomplete and specification vectorization compiler directives, which can effectively improve thegenerated vectorization code performance. The experimental results show that theinteraction-based vectorizaton code performance tuning framework can effectively enhance theperformance for some of the test programs in the SPEC CPU2000test suit. The maximumperformance acceleration can increase about50%and the average speedup by the optimization isincreased by about10%.Finally, the thesis test the overall SIMD vectorization recognition rate and generatedvectorization code performance of SW-VEC tool descripted in the dissertation. The experimentalresults show that the SW-VEC automatic vectorization recognition rate is better than the Intelcompiler version11.0, and the performance speedup is about16%higer than Intel compiler. Forthe test suit of high performance application, the speedup of interactive vetorization performancetuning has been relatively close to the manual rewrite vectorization program. The averageperformance speedup can be achieved the manually rewrite code speedup of more than90%,indicating that the framework of vetorization performance tuning is good for practice.
Keywords/Search Tags:SIMD automatical vectorization, continuity analysis, alignment analysis, dependency analysis, control dependence, vector registers reuse, interactive performance tuning, feedback tuning, pragmas
PDF Full Text Request
Related items