Font Size: a A A

Key Techniques Research On Multi-device Cooperative Parallel Computing For New-type Heterogeneous Many-core Systems

Posted on:2017-06-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J WanFull Text:PDF
GTID:1318330512959080Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Recently,heterogeneous many-core systems consisting of multiple multi-core CPUs and many-core accelerators have been widely used in the field of high-performance computing due to their advantages of high performance,low power consumption and low cost,and more and more parallel applications are developed in such a heterogeneous system.However,most of parallel applications can only utilize a certain kind of compute device of the heterogeneous many-core system due to the complicated system architecture and the lack of an easy-to-use heterogeneous cooperative parallel programming model,which caused a serious waste of the other compute devices of the heterogeneous many-core system,it is difficult to fully exploit the performance advantage of the entire system.Therefore,how to make full use of all the available compute devices of a heterogeneous many-core system to efficiently and cooperatively execute a parallel application has become an urgent problem.Heterogeneous cooperative parallel computing aims to fully exploit multiple compute devices of arbitrary type to cooperatively and concurrently perform a specific computational task on a heterogeneous many-core system,so as to improve the performance of the entire system.However,different devices have different architectures,instruction sets,processing capability,memory capacity,communication capability and so on,which brings a great challenge to the heterogeneous cooperative parallel computing.This thesis does an in-depth research and analysis on the key technologies of multi-device cooperative parallel computing for heterogeneous many-core systems,which focuses on the following four aspects:(1)In order to reduce the difficulty of heterogeneous cooperative parallel programming and the programming burden of the programmers and efficiently support the multi-device cooperative parallel computing of data-parallel applications based on heterogeneous manycore systems,a directive-based heterogeneous cooperative parallel programming framework called Open HCPP is proposed.Open HCPP provides an easier and more flexible way for programmers to fully exploit all the available compute devices of a heterogeneous manycore system to cooperatively execute a data-parallel application by extending the widely used Open MP.With the help of a source-to-source compiler and a runtime system provided by Open HCPP,programmers do not need to know how to partition work and transfer data between devices that participate in cooperative parallel computing.The experimental results show that the development efficiency and execution efficiency of data-parallel applications based on heterogeneous many-core systems are improved greatly by using Open HCPP.(2)In order to utilize multiple compute devices to reasonably,efficiently and cooperatively execute a data-parallel application on a heterogeneous many-core system,two interdevice dynamic task scheduling strategies which can efficiently support the heterogeneous cooperative parallel computing are proposed,including the feedback-based dynamic and elastic task scheduling strategy and the preemption-based dynamic and elastic task scheduling strategy.The former is more suitable for data-parallel applications whose computation and data are uniformly distributed and whose computational kernel only needs to be executed once or several times,while the latter is more suitable for data-parallel applications whose computation and data are non-uniformly distributed and/ or whose computational kernel needs to be executed many times.The experimental results show that the proposed two inter-device dynamic task scheduling strategies not only can achieve the full utilization of all compute devices and the good load balancing across devices,but also can avoid the overhead caused by frequent device initializations,kernel launches,inter-device data transfers and inter-device synchronizations.(3)Considering that the inter-device communication can easily become a performance bottleneck of multi-device cooperative parallel computing for some data-parallel applications on a heterogeneous many-core system,an incremental data transfer method and a communication optimization method based on software pipelining are proposed in order to effectively hide,reduce or avoid the communication overhead between devices.The core of incremental data transfer method is as follows: Based on the feedback-based dynamic and elastic task scheduling strategy and considering how to avoid the redundant data transfers between devices,a new feedback-based dynamic task scheduling strategy which can effectively avoid the redundant data transfers between devices is developed.The core of communication optimization method based on software pipelining is as follows: Based on the feedback-based dynamic and elastic task scheduling strategy and the preemption-based dynamic and elastic task scheduling strategy and considering how to overlap kernel execution on the accelerator with host-accelerator data transfers,a new feedback-based dynamic and elastic task scheduling strategy and a new preemption-based dynamic and elastic task scheduling strategy which both can effectively hide the communication overhead between devices are developed.The experimental results show that the proposed two inter-device communication optimization methods can significantly improve the overall performance of multi-device cooperative parallel computing for some data-parallel applications which have a large communication overhead between devices.(4)By using the proposed heterogeneous cooperative parallel programming framework,inter-device task scheduling strategies and inter-device communication optimization methods,an efficient CPU-GPU cooperative parallel computing of a complex application(i.e.,a parallel two-list algorithm for solving the subset-sum problem)is implemented.Considering that there are load imbalance and a large communication overhead between devices in the heterogeneous cooperative parallel computing of the generation stage of the parallel two-list algorithm,the proposed feedback-based dynamic task scheduling strategy which can avoid the redundant data transfers between devices is adopted in the CPU-GPU cooperative parallel execution of the generation stage.Considering that the computation and data of the pruning and search stages of the parallel two-list algorithm are uniformly distributed,the proposed preemption-based dynamic and elastic task scheduling strategy is adopted in the CPU-GPU cooperative parallel execution of the pruning and search stages.The experimental results show that the CPU-GPU cooperative parallel implementation of the parallel two-list algorithm for solving the subset-sum problem is significantly better than the CPU/GPU-only parallel implementation of the algorithm,which benefits from the full utilization of CPU and GPU and good load balancing and lower communication overhead between CPU and GPU.
Keywords/Search Tags:Heterogeneous many-core systems, Heterogeneous parallel programming, Heterogeneous parallel computing, Multi-device cooperative parallel computing, Data-parallel applications, Inter-device task scheduling, Inter-device communication optimization
PDF Full Text Request
Related items