Research On DCU-oriented Polyhedron Compiler Optimization Technology

Posted on:2022-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:W F Hu

Full Text:PDF

GTID:2518306323991749

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

As CPU-GPU heterogeneous computing systems are widely used in image processing,high-performance computing,deep learning and other fields,efficient compilation optimization for GPU architecture has become one of the most important optimization methods to improve the performance of heterogeneous computing systems.Aiming at several problems existing in the compilation system of the DCU(Deep Compute Unit)platform,the DCU-oriented polyhedral compilation optimization technology is studied.The main research contents and innovations of the thesis are as follows.1.Based on the open source compiler PPCG,a DCU-oriented polyhedral source-to-source compilation framework is constructed.The framework can automatically extract static control parts in programs by detecting code fragments that meet the affine restrictions,and schedule the execution order of statements.According to the schedule,this framwork can generate HIP code that can be compiled and run on the DCU platform host and acceleration device.The experimental results show that the average performance of the automatically generated HIP parallel code reaches 1.14 times the performance of the multi-core CPU on the Polybench test suite.And the highest speedup ratio is 3.3×.2.A DCU kernel partition method based on the occupancy is proposed.According to the principle of maximizing the occupancy of kernel functions,this method partition the kernel functions by restricting the loop fusion of polyhedral statements.In the scheduling calculation process of the outer parallel loop,the occupancy-based loop fusion restrictions ensures that the loop fusion will not cause the occupancy to decrease,and the analysis information of the occupancy can be used for the thread block size selection.In some test cases that are sensitive to the kernel partitioning method,this method can effectively improve the performance of the generated DCU code.Compared with a variety of existing loop fusion strategies,this method partitions the DCU kernels with the highest occupancy,and obtains a performance improvement ranging from 10.3% to 50.4%.3.A global memory access optimization method based on polyhedral data reuse analysis is proposed.In the schedule tree,this method analyzes two types of data reuse existing in the DCU kernel,and calculates the global memory access cost under different loop orders.According to the global memory access costs obtained under different loop orders,this method performs loop permutation on scheduling to increase the proportion of global memory accesses that can be combined,which can greatly improve the efficiency of global memory access.The experimental results show that this method can effectively find the loop order with less global memory access cost in the legal loop permutation space.Compared with the DL cost model in Poly AST,it achieves an average performance improvement of 12.4% in the Polybench test suite.The GPU version Polybench,Polybench-GPU,is ported to the DCU platform.The HIP code generated by the optimized source-to-source compiler reaches 92.4%of the Polybench-GPU performance.The experimental results prove the effectiveness of the proposed DCU kernel partition method and global memory access optimization method.

Keywords/Search Tags:

DCU, Polyhedron model, Loop transformation, Automatic parallelization, Data access optimization

PDF Full Text Request

Related items

1	Research On Parallelism Speculation And Polyhedron Compilation Technology Based On Artificial Intelligence
2	Research On SIMD Vectorization Of Loop Nests And Its Optimization Techniques
3	Research On Automatic Parallelization And Optimization Technologies For Shared Memory Architecture
4	Automatic Parallelization And Implementation Of Loop Nest Based On MPI Platform
5	The Research Of Parallelization Technique Based On Shared Memory Structure And Optimization
6	Automatic Parallelization For Seismic Data Processing Programs On Grid Environment
7	Parallelization & checkpointing of GPU applications through program transformation
8	Research On SIMD Compilation Technology Based On Polyhedron Model
9	Research And Realization Of Automatic And Optimal Transformation Of C Language For Heterogeneous Reconfigurable Processor
10	Research On Loop Optimization Based On Polyhedral Model