Font Size: a A A

Dataflow Task Partition And Scheduling For GPU/CPU Heterogeneous System

Posted on:2020-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:M T ChenFull Text:PDF
GTID:2428330590958394Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Heterogeneous computers combine the powerful parallelism of GPU with the logical processing power of CPU and are widely used in business and scientific research.However,in order to fully utilize the performance of heterogeneous computers,hardware resources need to be properly deployed.As a dataflow programming language,COStream has good effects on traditional multi-core CPU.However,in the environment of heterogeneous systems,it also faces problem such as unbalanced load and excessive communication cost.In order to give full play to the advantages of heterogeneous systems,the COStream dataflow programming language is taken as the research object,and the data flow task partition algorithm and optimization scheme for GPU/CPU heterogeneous systems are designed,including HLBP(heterogeneous load balancing partition)algorithm,inter-device communication optimization and NDrange optimization.HLBP algorithm divides data flow graphs from three steps: preliminary partition of dataflow tasks,load estimation and adjustment,inter-device task partition.The algorithm has two advantages: fully exploiting the parallel property of data flow program,place the actors to most suitable devices,and take advantage of each device.Estimating and adjusting the load among devices to balance the load among devices.Consider the communication overhead while balancing the load between devices to improve the performance of the program.The communication optimization is aimed at the problem of large communication overhead between devices.It separates communication tasks from GPU nodes and hides communication overhead in computing by software pipeline.The NDrange optimization algorithm automatically optimizes the NDrange allocation for actors of different sizes,so that the converted kernel can make full use of the GPU hardware resources and improves the execution efficiency of the program.The x86 heterogeneous computer is used as the experimental platform.Six typical algorithms in digital media field are selected as benchmarks to verify the effectiveness of the algorithm.The results show that the HLBP algorithm and the optimization scheme have achieved good results.
Keywords/Search Tags:Dataflow, Heterogeneous System, Task Partition, Communication, OpenCL
PDF Full Text Request
Related items