Research And Implementation Of A GPU Efficient Kernel Scheduling Method Supporting Multi-stream Concurrency

Posted on:2023-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Liu

Full Text:PDF

GTID:2568307097479154

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Now,the GPU supports multiple applications running concurrently,and various applications require different resources on the GPU during runtime.If these resources are not properly allocated to the resource requests of these applications,it often results in the underutilization of resources,which presents a challenge for programmers.Kernels are the key to GPU computing,and efficient allocation of GPU resources required by kernels is beneficial to shortening the running time of the entire GPU application.Relevant research shows that kernels can be classified according to the requirements of different resources during execution,and concurrent execution of different kinds of kernels is more conducive to the full utilization of GPU resources,thereby shortening the overall running time of multiple applications.Based on this idea,this paper proposes a multi-stream CUDA(streams CUDA,s CUDA)scheduling framework based on previous research on kernel scheduling methods on GPUs to schedule concurrent kernels more efficiently.It consists of three parts: multi-stream concurrent kernel scheduling,CUDA stream dynamic scheduling,and data block transmission strategy.Firstly,based on the research on the classification of kernels with different execution behaviors according to the kernel resource requirements and the concurrent execution of different types of two kernels,this paper proposes a multi-stream concurrent kernel scheduling scheme.For each set of two kernels executing concurrently,divide the thread block assigned to them into slices equal to the number of SM in the GPU,similar to sub-kernels,and mix them each time by(1/n)distribute to different CUDA streams until the slices of the group of kernels are distributed,further increasing the degree of concurrent execution of different types of kernels.Where n represents the number of sub-kernels with shorter running time,and the size of n depends on how many sub-kernels with shorter running time can be executed within the time of executing one sub-kernel with longer running time.Secondly,considering the overhead of CUDA stream creation and allocation scheduling,a fixed number of streams is not necessarily applicable to all cores,this paper proposes a dynamic CUDA stream scheduling strategy to dynamically allocate the optimal number of CUDA streams.Finally,based on the idea of kernel slicing and offloading,the data in the data transmission part of GPU computing is considered to be divided into blocks and divided into streams to reduce the occurrence of long data transfer tasks and further shorten the running time of multiple GPU tasks.This paper evaluates the performance and execution time of s CUDA on 11 different cores in three sets of widely used benchmarks.The experimental results show that the scheduling framework can reasonably allocate the GPU resources required by the kernel,and further shorten the execution time of multiple applications through efficient scheduling of CUDA stream resources.Compared with the research c CUDA(concurrent CUDA)in 2019,achieved an average of 20% promotion.

Keywords/Search Tags:

GPU, multitasking, kernels, kernel slices, concurrent execution, CUDA stream scheduling

PDF Full Text Request

Related items

1	Research On Concurrent Execution Mechanism Of Esper Data Stream Processing Engine
2	Research On Kernels For Structured Data
3	An Extensible Architecture for Building Certified Sequential and Concurrent OS Kernels
4	A Stepwise Symbolic Execution Testing Method And Implementation For OSEK/VDX Multitasking Applications
5	Design And Application Of Kernels For Biological Sequences
6	The Research And Implementation On The Kernel Execution Mechanism Of MASA Stream Processer
7	Concurrent scheduling of multiple loops using the force-directed scheduling algorithm
8	The Research Of Image Processing And Identification Method For Cropâ€™s Kernels
9	Research Of Concurrent Program Testing Framework Based On Symbolic Execution
10	Parameter Selection For The Gaussian Kernel And Construction Of Orthogonal Polynomial Kernels