Automatic Offloading And Optimization Of Openmp Programs For Heterogeneous Platforms

Posted on:2021-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:H N Guo

Full Text:PDF

GTID:2428330611498198

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the information society,the amount of information that the Internet needs to process is also accumulating in a short time,and the demand for data and computing processing is also increasing,which also promotes the rapid development of high-performance computing.CPU processors are gradually unable to meet the performance requirements of high-performance computing programs,and a heterogeneous computing system composed of CPUs and accelerators has become a new choice.Porting complex computing programs to accelerators such as GPUs can not only effectively improve computing performance and shorten computing time,but also give full play to the accelerator's computing advantages and reduce energy consumption.Aiming at the transplantation of parallel programs,this paper proposes an automatic conversion and optimization scheme for parallel programs for heterogeneous platforms.In essence,it is an automatic source-to-source compiler,and its defined name is Open MP Automated Offloading,or OAO for short.For the Open MP program,the entire compiler system is composed of information collection module,data transmission module and dynamic loading module.The information collection module uses Clang,a front end of the LLVM compiler,to analyze the source program.At the same time,the concept of serial parallel graph is proposed in this module to realize the division of the serial domain and parallel domain of the program.In order to ensure the consistency in the data transmission process,the self-designed data transmission model is introduced in the data transmission module,and the transmission is optimized in this process to ensure the most simplified transmission.In addition,the use of unified memory technology for complex data structures stand by.In the dynamic loading module,the method of introducing runtime API is proposed to ensure the minimum modification of the source program,eliminate redundant data transmission,and complete automatic source-to-source conversion.The system test of the OAO compiler is performed on the RTX2080 Ti and K40 experimental platforms,and the performance test is performed using the data set in the public benchmark suite.The experimental results show that the OAO compiler can correctly convert all 23 data sets,and it has a significant acceleration effect on 15 of them.The larger the amount of data,the more obvious the performance improvement,and the maximum acceleration ratio is 20 times.This article also compares with another famous source-to-source compiler and manual offloading method.Experiments show that the performance of the OAO compiler is better than the other two methods.The runtime introduced by the dynamic loading module has an additional time overhead less than 0.1% of the entire program runtime.In addition,by using unified memory,the OAO compiler also has good support for complex data structures.After testing,the OAO compiler system designed in this paper can effectively convert parallel programs,and significantly improve the performance of the program.The test results are generally in line with expectations and have high practicality.

Keywords/Search Tags:

High-performance computing, parallel programs, OpenMP, automatic offloading, compiler

PDF Full Text Request

Related items

1	Research On High Performance Of GRAPES Tangent/Adjoint Model With The MPI/OpenMP
2	Compiler Techniques for High Performance Computing, Energy Efficiency, and Resilience
3	Research Of High Performance Evolutionary Algorithm Based On Distributed Parallel Computing
4	Performance Analysis Of OpenMP Parallel Programs
5	Research On High Performance Parallel Computing Architecture Based On FPGA+DSP
6	Research On Compilation And Optimization For OpenMP Programs
7	The Research On Design Method Based On Parallel Program For Windows Environments
8	A compiler for parallel execution of numerical Python programs on graphics processing unit
9	The Design And Implementation Of An OpenMP-to-OpenCL Code Automatic Conversion Tool
10	Research On Automatic Generation Of Analytical Performance Model For Parallel Program