Font Size: a A A

Research On Key Technologies For Pipeline Programs On Heterogeneous Systems

Posted on:2020-10-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhengFull Text:PDF
GTID:1368330626964470Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Heterogeneous systems with GPUs as accelerators are important operating platforms for high-performance programs.Current GPU programming models provide excellent support for data-parallel programs,but lack good support for pipeline programming.Pipeline programming mode can be used to simplify programming,better utilize the computational advantages of multi-device platforms,and explore the parallelism of more dimensions.Thus,pipeline programming is used in various fields.In complex architec-tures,by exploiting the parallelism between different task phases of the pipeline program(ie,task parallelism),the program can make better use of the computing power of the hardware.The architecture of GPU heterogeneous systems is complex,thus task par-allelism should be helpful for pipeline program to fully exploit hardware performance.However,current GPU programming model lacks good support for pipeline parallelism,making it difficult to develop high-performance pipeline programs.First of all,current GPU programming models lack good support for task parallel mode.The existing task parallel methods bring problems of low GPU computing resource utilization and reduced program parallelism.Meanwhile,there is a lack of research on cross-device pipeline optimization on CPU-GPU systems.Current programming model are difficult to support pipelined data transmission across devices.Focusing on this problem,this paper studies the optimization techniuqes of pipeline program on heterogeneous systems from three aspects:computing,communication and programming.The main research contents and contributions of this paper are as follows:(1)This paper systematically analyzes and summarizes the performance bottleneck of the existing pipeline execution models on GPU.Based on the existing time-dimension task scheduling models,the concept of pipeline task scheduling in space-dimension is proposed,alone with two new pipeline task execution models.The new scheduling models can effectively support task parallelism and improve program parallelism by reducing the GPU thread's occupation of register and shared memory resources,ultimately helping the program to make fuller use of computing resources on the CPU-GPU system.Experiments show that the new scheduling model can bring up to 6.9 times(average 2.88 times)performance improvement for pipelined programs.(2)This paper proposes a method for efficient pipeline data transmission across de-vices.The method implements a cross-device queue data structure,which can realize fine-grained and small-batch dynamic asynchronous data transmission between multiple devices.On this basis,this paper solves the problem of multi-threaded access con-tention of shared data structure on GPU,optimizes the performance of Stencil pipeline cross-device data transmission,and proposes an efficient pipeline termination condition checking mechanism.Experiments show that this method brings 1.22 times to 2.13 times performance improvement for the cross-device pipeline program(3)This paper summarizes eight key metrics of the pipeline programs,and proposes a hybrid model that can fully utilize the advantages of all pipeline execution models ac-cording to the load characteristics of the pipeline program.Aiming at the problem that the hybrid model's parameter configuration space is too large,this paper proposes a per-formance tuning method with both dynamic and static approach.This paper implements a pipeline programming framework on CPU-GPU heterogeneous systems,including a programming interface,performance tuning tools and runtime library,which can help developers write high-performance pipeline programs efficiently.Experiments show that developers only need to modify dozens of lines of code to port existing pipeline pro-grams to the framework.Performance tuning tools can successfully find the optimal or near-optimal parameter configuration method.
Keywords/Search Tags:pipeline program, pipelined communication, performance optimization, GPU
PDF Full Text Request
Related items