Font Size: a A A

Research On GPU Load-Balancing Scheduling Model In Distributed Heterogeneous Computing Framework

Posted on:2022-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:L F DuFull Text:PDF
GTID:2518306731987909Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
GPU has been widely used in high performance computing and deep learning due to its high concurrency,high bandwidth and low power consumption.In order to make full use of the computing power of CPU-GPU heterogeneous cluster,the latest open source distributed processing framework Spark has added new features supporting GPU acceleration to expand the use of distributed computing framework in various application scenarios.But,Flink framework does not support GPU scheduling.If the size of the cluster is limited,the computing power of the cluster can be improved by expanding the number of GPUs on the computing nodes.However,in distributed heterogeneous environment,the computing power of each GPU changes dynamically,and the computing power of GPU with different architectures varies greatly.So in order to achieve workload balancing among heterogeneous GPUs,each GPU should get the workload that matches the computing power.However,currently open source distributed processing frameworks(such as Spark and Flink)lack the effective GPU workload balancing scheduling models.In order to effectively integrate GPU into the distributed processing framework and solve the unbalanced workload of GPU in distributed heterogeneous environments,this paper proposes a multi-GPU load-balancing scheduling model(MLSM).MLSM scheduling model is used for GPU task scheduling and device resource management in distributed heterogeneous environment.The contributions of the MLSM include:(1)a fine-grained task mapping mechanism,which implements the mapping between the JVM tasks of distributed computing framework and the GPU tasks.It uses the automatic task decomposition mechanism to decompose each JVM task into a series of finer-grained GPU tasks that can be executed concurrently,whicn can increase the concurrency degree of GPU task and lay the foundation for GPU workload balancing scheduling.(2)a device resource unified management scheme,which can effectively manage GPU memory and asynchronous GPU stream,and reduce the complexity of programming model.The unified device resource management scheme can cache data in GPU memory to reduce communication overhead.(3)a feedback-based streams adjustment scheme,which can adjust the asynchronous stream resources of each GPU according to the calculation results of the previous iteration.GPU with high computing power will be assigned more asynchronous GPU streams.And the number of idle asynchronous streams is used to dynamically measure the computing power of the GPU.(4)a novel resource-aware GPU task scheduling strategy,which schedules the execution of GPU tasks based on the real-time computing power and resource state of GPU,and and achieves workload balance among heterogeneous GPU.Finally,MLSM heterogeneous task scheduling model is implemented based on Spark computing framework.And several representative algorithms are used to evaluate the performance of MLSM.The experimental results show that MLSM can effectively integrate GPU into the distributed processing framework and implement workload balancing among heterogeneous GPUs.
Keywords/Search Tags:distributed computing, GPU acceleration, heterogeneous environment, GPU load-balancing scheduling
PDF Full Text Request
Related items