Font Size: a A A

Research On The Key Technologies In Binary Translation Of GPU Programs

Posted on:2013-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2248330395480590Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Binary translation technology can transplant a binary program from a source machine to atarget machine in the case of no source code. It not only solves the compatibility of software andhardware, but also plays an important role in the field of information security and has greatsignificance for designing domestic processors and computer systems independently. However,the traditional binary translation technology is limited to single-core processors. With thegrowing demand and the increasing number of many-core processors on the market, binarytranslation technology is facing a new challenge of the new architecture computer systems.Based on the in-depth understanding of the CUDA heterogeneous parallel computingarchitecture, a new binary translation framework and several key technologies are studies in thisthesis. The main contributions of this thesis are:1) Aiming at the problem that traditional binarytranslation framework is unable to adapt to the heterogeneous many-core architectures, a staticbinary translation system framework for the CUDA programs is designed and implemented. Theframework adopts the idea of “divide and conquer”, CPU code and GPU code of a CUDAprogram is translated to the target platform’s master core and slave core array by differenttranslators respectively. Especially, this thesis studies the binary translation technology of GPUprogram, achieves a prototype binary transplantation system GPUtoM, which translates theNVIDIA GPU programs to a domestic heterogeneous many-core processor.2) Aiming at theproblem that there are great differences between different many-core processors with the parallelgranularity and thread-level, this thesis presents a hierarchical thread mapping model. In the firstlevel, each CTA of the kernel function is mapped to a slave core in the target platform; in thesecond level, the GPU threads of the CTA is executed by a target thread in a circulation order.3)Aiming at the problem that barrier synchronization is allowed by GPU threads, this thesisproposes an enforcing-synchronization algorithm which is based on the Thread-loop Structure.Without change of the semantics of barrier synchronization, the algorithm splits the PTXprogram into two parts at each synchronization point, then, several instructions are inserted toprotect and restore the thread execution environment, and each part is executed by theThread-loop Structure.4) Aiming at the problem that many-core processors often have complexmemory structures and rich special memory, a memory mapping model is presented formulti-level memory. This model considers the correctness and efficiency of the generated code,completes the memory mapping form NVIDIA GPU to a domestic many-core processor.After discussion of several key technologies of binary translation for GPU programs,GPUtoM, a GPU program binary translation prototype system is design and implemented.Finally, GPUtoM is validated against over several applications taken from the Test-gpu, theCUDA SDK and the Parboil benchmarks, the results show that the methods and the techniquesproposed are correct and effective.
Keywords/Search Tags:Binary Translation, Graphics Processing Units, Thread Mapping model, MemoryMapping model, Enforcing-Synchronization Algorithm
PDF Full Text Request
Related items