Font Size: a A A

Research On Key Technologies Of Code Generation For CPU-GPU Heterogeneous Parallel Computing

Posted on:2018-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhaoFull Text:PDF
GTID:2348330512489102Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
General purpose computing on graphics processing units,which has recently gained wide popularities among researchers and engineers,provides a convenient platform for accelerating the compute-bound applications.Programming frameworks on CPU-GPU heterogeneous parallel computing platforms such as CUDA and OpenCL provide relatively low-level interfaces to use the graphics processing units in the ways of SIMT and SPMD to address problems which can be expressed as data-parallel computation.However,with the low-level languages provided by the vendors,the researchers and the engineers need to understand the hardware architecture,the memory hierarchy and the execution model of the CPU-GPU heterogeneous systems for parallel computing,moreover,they need to handle a large amount of details involving synchronization and the optimization of the usages of all kinds of device memory.These burdens make programming on the CPU-GPU heterogeneous parallel computing platforms to solve scientific computing problems really difficult and error-prone.The evolutions in machine learning,data mining,image procession and other domains bring high requirements on the efficiency of large-scale scientific computation.Although the Nvidia CUDA provides C-like interfaces for researchers and engineers to program the Nvidia GPUs,however,due to the fact that programming on CPU-GPU heterogeneous parallel computing platforms is still challenging for many programmers,we think it is necessary for us to invent a high-level programming model which is efficient enough and can be easily used.Therefore,in this thesis,we introduce a tool chain for programming the CPU-GPU heterogeneous parallel computing platforms:1.We design a programming language Roya L for the users,which supports largescale linear algebra operations.This script language provides matrix types and related operations in the type system,moreover,it supports strong type semantic checking.2.We develop a compiler framework Roya to translate the Roya L program to the optimized Nvidia CUDA code.The Roya compiler framework implements important modules such as the abstract syntax tree,the symbol table and the intermediate representation,on this basis,kinds of optimization methods are used to optimize the input source code at different stages of the compile time.These convenient tools hide the complexities of the hardware architecture,the memory hierarchy and the execution model of the CPU-GPU heterogeneous systems and handle the tedious and error-prone tasks in an automatic and elegant manner.The RoyaL programming language and the Roya compiler framework,which provide a high-level programming model for the users,can not only conducts domain specific optimizations involving matrix chain multiplication/addition,but can find the parallelization patterns in nested loops and extract them as kernel functions in an automatical manner.On this basis,we conduct a series of contrast experiments to validate various optimization methods and present the performance gains on the time overhead under different scenarios.Finally,we summarise the drawback to the type system as well as the block parallel methods of the Roya compiler framework,and give out the research proposal at the end of this thesis.
Keywords/Search Tags:CPU-GPU Heterogenous Platform, Compiler Framework, Optimization, Code Generation, CUDA
PDF Full Text Request
Related items