Font Size: a A A

Research On Loop Transformation Optimization Based On Domestic Shenwei Compiler

Posted on:2022-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:C X WangFull Text:PDF
GTID:2518306755960849Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The research and development of domestic high-performance processors has provided a steady stream of impetus for the development of high-performance computer systems in my country.As a basic component of a high-performance computer system,the compiler can generates code that can be executed efficiently for the processor.Compiler optimization is the core component of the compiler and the key to improving the efficiency of application programs.Loop transformation is an important part of loop processing in Compiler optimization technology,and is usually used to explore the parallelism,vectorization and data locality of loops in applications.Loop unrolling and loop distribution are common loop transformation optimization techniques.Based on these two techniques,this thesis focuses on the loop-level optimization in the Sunway compiler,to improve the execution efficiency of the code generated by the Sunway compiler.The main work and contributions completed are as follows:1.A calculation method of loop unrolling factor based on instruction cache and register pressure is proposed.The selection of the unrolling factor directly determines the optimization effect of loop unrolling.Excessive loop unrolling will cause instruction cache overflow and increase register pressure.Too few loop unrolling times will waste potential performance improvement opportunities and cannot fully improve the performance of the program during runtime.Aiming at the influence of instruction cache and register resources on loop unrolling,this thesis combines the instruction Cache and register pressure of Sunway platform,and implements the unrolling factor calculation method on Sunway compiler.The experimental results show that,compared with the original unrolling factor calculation method,this method can improve the overall performance of SPEC CPU 2006 Benchmarks by 2.7%,and the overall performance of NPB-3.3.1 Benchmarks by 5.4%.2.A topological sorting method based on Strongly Connected Components Reduced Dependence Graph(SCCRDG)is proposed.Because the loop distribution in the compiler is relatively simple and radical,the number of loops after distribution will surge,thereby increasing the execution overhead of the loop itself,and the data distributed in different loops needs to be read multiple times,which is not conducive to the reuse of registers and caches.In this thesis,from the perspective of reducing loop overhead,a topological sorting method based on Strongly Connected Components Reduced Dependence Graph SCCRDG is proposed.In the case of ensuring the correctness of the loop distribution,the topological sorting of the reduced dependence graph sequence that is conducive to the maximum aggregation of nodes,so as to reduce the number of generated loops and effectively reduce the loop overhead of the original algorithm.The method is implemented in the Sunway compiler and has passed the correctness and validity verification of SPEC CPU 2006 and NPB-3.3.1 Benchmarks.The experimental results show that this method can effectively reduce the running time of the program and improve the performance of the program.Among them,the optimized performance of462.libquantum was improved by 21%.
Keywords/Search Tags:compiler optimization, loop unrolling, loop distribution, Sunway 1621 processor
PDF Full Text Request
Related items