Font Size: a A A

Transplantation And Optimization Of Auto-vectorization Of LLVM Compiler Based On Sunway Processor

Posted on:2022-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:J N LiFull Text:PDF
GTID:2518306326996079Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Automatic vectorization technology can effectively use the underlying SIMD hardware and become an important means to improve the execution performance of scientific computing programs.At the same time,automatic vectorization optimization has also become a research hotspot of compilers.However,due to different technical routes,the SIMD extended instruction sets of various microprocessors vary greatly.The automatic vectorization needs to change the algorithm and adjust the parameters for the specific SIMD extension components to give full play to the hardware characteristics.In addition,the structure of the control flow in the loop,the insufficient number of isomorphic sentences in the basic block,and the randomness of the packaging method will hinder the ability of vectorization to explore.Proper optimization is needed to explore the potential parallelism.The main contributions of this thesis include:Research on LLVM's automatic vectorization technology.The LLVM open source compiler has initially implemented two automatic vectorization methods at the loop level and the basic block level.The above two methods are the current research hotspots of automatic vectorization and compilation.This article sorts out the implementation code in the loop-level and basic block-level automatic vectorization modules of the LLVM compiler from three aspects: legality analysis,vector mining,and vector code generation,and constructs LLVM automatic vectorization suitable for the Sunway platform.The process lays the foundation for the transplantation of automatic vectorization function modules.Transplantation and optimization of loop vectorization.Due to the difference in vector length and instruction set functions,there are many differences between the automatic vectorization process for the Sunway platform and the open source LLVM.This article carries out the loop-level vectorization for the Sunway platform from the two aspects of vector register length and vectorization information.Transplantation work.Aiming at the problem that the Sunway platform does not support mask instructions,a mask instruction conversion algorithm based on control flow analysis is proposed.TSVC standard test set test shows that after the algorithm is improved,the recognition rate of control flow vectorization has increased by 48%,and the average speedup has increased by 60 %;Aiming at the singularity of the control flow vectorization method,a phi node optimization algorithm using the select vector instruction to enhance the control flow vectorization is proposed.After optimization,the speedup ratio of the s441 test case in the TSVC test set is 2.4..Transplantation and optimization of basic block vectorization.The implementation of the existing basic block-level vectorization method in LLVM is not suitable for the Sunway platform.For this reason,this article carries out the transplantation of the basic block-level vectorization for the Sunway platform from the two aspects of the constraint of the number of isomorphic sentences and the evaluation of the vector instruction cost.jobs.Aiming at the problem that the randomness of basic block-level vectorized packing isomorphic sentences leads to poor returns of vector codes,a packing algorithm based on vector instruction cost evaluation is proposed.After the algorithm is improved,the speedup ratio of the 453.povray test case in the SPEC standard test set is Improved by 17.1%;in response to the problem of insufficient vectorization mining capabilities in LLVM,a hybrid optimization method combining intra-iteration and inter-iteration vectorization mining was proposed.After optimization,the average speedup of typical sample programs reached 2.04.The work of this paper is based on LLVM-7.0.0 version,which has been implemented on Sunway 1621 processor.The overall test of SPEC CPU2006 and TSVC standard test set verifies the correctness of transplantation and optimization.The average performance of fixed-point programs in the SPEC CPU2006 standard test set has increased by 2.7%,the average performance of floating-point programs has increased by 18.2,and the overall average performance has increased by 11.3%.The average speedup of matrix multiplication test cases has reached 7.2,which verifies the effectiveness of porting and optimization.The relevant results of this article have been applied to the Sunway platform,effectively using the vector instructions in the processor to achieve the performance improvement of the SIMD expansion components.
Keywords/Search Tags:LLVM, Auto-vectorization, Sunway, Loop-level, Basic block-level
PDF Full Text Request
Related items