Font Size: a A A

Based On The Shenwei 1621 Platform BLAS Primary And Secondary Function Optimization Research

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiFull Text:PDF
GTID:2518306494950229Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high performance processor,high performance application layer has emerged one after another.BLAS(Basic Linear Algebra Subprograms)is the foundation of various high performance application software,and its operation performance often determines the performance level of high performance application software.At present,every major processor manufacturer has launched BLAS library based on their own architecture platform for their own processors.Sunway,as a representative of domestic high performance processor with independent property rights,is widely used in various fields.However,due to the incompleteness of its software ecosystem,many software architectures are not optimized for performance.As one of the key basic computing software,BLAS is in urgent need of a version of high performance implementation that is deeply optimized for the Sunway processor platform.In this paper,the first and second level functions of BLAS are designed and optimized on the Sunway 1621 processor,combined with the structure characteristics of Sunway processor.This article first analyzes the Sunway 1621 processor architecture,especially the function had a greater influence on the performance characteristics of the assembly line,and the characteristics of the storage,then introduces the commonly used optimization method combined with Sunway 1621 processor architecture were analyzed,and given the BLAS libraries optimization research hot spot is different,in this study,a relatively small to optimize BLAS first class and second class libraries as optimization object,the main research contents and contributions are as follows:1)According to the characteristics of BLAS first-level function,the optimization objective of BLAS first-level function was determined on the optimization of memory access.SIMD vectilization instructions were used to increase the degree of data-level parallelism,and loop unwinding,instruction rearrangement and other technologies were used to reduce the dependence between data and instructions,so as to improve the degree of flow parallelism.For the use of multi-threading,this paper designed an automatic thread allocation scheme to optimize the number of open threads.Finally,the single-core optimization speedup ratio of BLAS level 1 function was 4.36,and the multi-core optimization speedup ratio was 9.50.2)In view of the computing characteristics of BLAS secondary function only accesses the matrix once,combined with the storage mode of matrix in Sunway 1621 processor,this paper designs the data access mode of row main sequence storage and column main sequence access.First through the loop unrolling reduces the number of memory access,and use of SIMD Sunway 1621 processor provides vectorization vectorization,improve data's efficiency,at the same time cooperate with instructions rearrangement technique to eliminate dependencies between instructions.In the case of small-scale matrix,the replacement of memory allocation function improves the overall performance of the function,and finally improves the performance of BLAS second-level function by more than 25%.The optimization methods proposed in this paper have important guiding significance for the implementation of high performance BLAS on the Sunway architecture platform.
Keywords/Search Tags:Sunway 1621, BLAS, Loop unrolling, SIMD vectorization, Instruction rearrangeme
PDF Full Text Request
Related items