Font Size: a A A

Research On Key Techniques Of Polyhedral Compiler Optimizations For Heterogeneous Systems

Posted on:2022-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:1488306521457634Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The continuous changes in computing technology make the computer architecture change with each passing day.A transition from a homogeneous computing model to a heterogeneous computing model is undertaken.The huge difference of architectures and programming models between different processor manufacturers poses a huge challenge for application and promotion of heterogeneous computing.Using compilation technology to automatically convert serial programs into heterogeneous parallel programs is an effective method to solve this problem.The compilation technology based on the polyhedral model is considered to be a research hotspot in the field of automatic program parallelization.This dissertation studies the key issues of parallelization and locality optimization to make full use of the hardware architecture.The main contributions are as follows:(1)Targeting on exploiting parallelization and locality optimization,this thesis studies the principle and process of polyhedral compilation and optimization,and analyzes the main characteristics of the polyhedral model.Comparing with traditional unimodular matrices used in parallelizing compilers,the polyhedral model has a wider range of applications,more powerful representation capabilities and more comprehensive optimization space.At the same time,it also has problems such as high abstraction level and difficulty in implementation.In order to have a comprehensive and in-depth understanding of the polyhedral model,this thesis explains the principle of the the compilation process of the polyhedral model,and then deeply studies the core scheduling transformation algorithm of the polyhedral model.Last but not the least,from the perspective of program parallelism and data locality,it points out the main research content of the polyhedral model.(2)Targeting on exploiting data locality and full tile-level parallelism,this thesis proposes a loop tiling approach for general multi-core homogeneous architecture.Loop tiling is an essential transformation technology for improving data locality in programs.The polyhedral model implements parallelogram tiling,but it fails to exploit full tile-level parallelism between the tiles.The research community designed some complex tiles shapes,including split tiling,diamond tiling and hexagonal tiling,to enable full tile-level parallelism.Among them,diamond tiling and hexagonal tiling approaches are implemented in existing polyhedral compilers.However,split tiling is still considered as foreign to the polyhedral model,since its implementation has to resort to non-affine expressions which are out of the scope of the polyhedral model.This thesis designs a parallelogram-based split tiling algorithm to avoid the problem of traditional splitting and partitioning relying on non-affine expressions,and implements the algorithm in the PPCG compiler.Experiments have tested different types of stencil computations.The experimental results show that the Open MP parallel code generated by the PPCG compiler using our proposed approach has a 2% increase in performance compared to the code generated by the current best diamond tiling algorithm and a 91% performance improvement compared with stencil code generated by the domain-specific compiler Pochoir.(3)Targeting on the code generation of split tiling for different architectures while minimizing the synchronization cost,this thesis proposes a loop tiling approach for GPU architecture.On the one hand,the implementations of diamond tiling only target on CPU architectures,while the current version of hexagonal tiling can only support code generation for GPUs.Users have to switch between different tiling techniques when experimenting on different architectures.On the other hand,complex tiles shapes implement tile-wise concurrent start at the expense of additional barriers/synchronizations.Based on the algorithm of split tiling on CPU,this thesis presents a split tiling approach for GPU architectures to automatically generate CUDA code with minimum synchronization cost,realizing the mapping from loop level after tiling to GPU hardware,and the approach is implemented in PPCG.Compared with diamond tiling,our proposed approach supports different dimensions of tile size;compared with hexagonal tiling,the proposed approach can handle multiple sentences,symbolic constant loop boundaries and other complex situations.Experiments have tested different types of stencil calculations,and the results show that our proposed approach can obtain a mean speedup of 1.64× over the current most widely used hexagonal tiling.(4)Targeting on exploiting multi-dimensional parallelism,this thesis proposes a parallelism recognition method oriented to hardware scale.With the continuous increase of core numbers in modern processor architectures,traditional single-dimensional parallel recognition methods cannot provide sufficient parallelism.This thesis proposes a cyclic multi-dimensional parallel recognition method oriented to the number of resources on the target platform.According to the relationship between the number of parallel layer iterations and the number of hardware resources provided by the target platform,multiple dimensions of nested loops are dynamically identified as parallel layers,and the iteration spaces of multiple parallel dimensions are merged and then divided into tasks to achieve full utilization of the hardware resources of the target architecture.This method is implemented in PPCG and tested by core calculation programs such as matrix multiplication and laplace equation.Compared with the existing single-dimensional parallel methods,the performance of the method proposed in this thesis improves 1.8 times on SW26010 heterogeneous many-core processors and improves 5.2 times on Nvidia GPUs.This dissertation exploits automatic parallelization for heterogeneous systems based on the polyhedral compilation,supporting the code generation of different parallel programming models,including Open MP directives,CUDA and Open CL APIs.By effectively exploiting loop parallelism,while taking into account the characteristics of the data locality of the target architecture,it provides support for generating more efficient parallel code.
Keywords/Search Tags:polyhedral model, heterogeneous systems, parallelization, locality, split tiling, multi-dimensional parallelism recognition
PDF Full Text Request
Related items