Font Size: a A A

Optimization Methods For Irregular Tasks On CPU-GPU Heterogeneous Platform

Posted on:2022-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L CaoFull Text:PDF
GTID:1488306338484854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
When executing on a parallel platform,irregular tasks will lead to control flow divergence,load imbalance,poor data access locality and other problems,resulting in low resource utilization.For high performance and low cost,a large number of irregular tasks need to be migrated to CPU-GPU heterogeneous platforms.Therefore,it is of practical significance and value to study the implementation and optimization of irregular tasks on CPU-GPU platform.This paper studies the parallel decomposition,mapping and running process of irregular tasks with dynamic computation load and uneven data decomposition on CPU-GPU platform.From the perspective of parallel algorithm design and hardware computing platform improvement,we optimize the performance of an irregular task-solving algorithm through thread task merging,dynamic parallel and pipeline structure improvement.The main research contents and innovations are as follows:(1)This paper studies the implementation and optimization of irregular tasks with dynamic tasks on CPU-GPU heterogeneous platforms.Taking the image region filling algorithm as the research object,we propose a multi seed parallel the scheme based on connected graph and union search algorithm.The algorithm uses reasonable granularity to divide the filling area randomly,and completes the filling region through the competition and cooperation between threads.We use the multi-thread parallel scheme on CPU and CUDA parallel scheme on GPU to to implement the filling and merging phases of parallel algorithm respectively.After comprehensively measuring the implementation cost,performance benefit and transmission load of these schemes,we combine the best scheme into a complete algorithm.Experiments show that the parallel algorithm can be used in application scenarios with real-time requirements,and the optimization for this type of irregular task has achieved significant benefits.(2)On the CPU-GPU heterogeneous single node platform,we study the parallel implementation of irregular tasks with uneven data decomposition.On CPU and GPU platforms,we explore the optimization method of candidate solution evaluation algorithm of N-Queen variant 2 by changing the granularity of data decomposition,subtask combination,CUDA dynamic parallel and other means.The effectiveness of these optimization methods is verified by simulated annealing algorithm.Based on the above research,we explore the method of task decompositon and mapping between and within nodes on GPU cluster,and construct a two-level parallel genetic algorithm of island model and master-slave model to solve variant 2 on the cluster with multiple CPU-GPU nodes by combining MPI technology.Compared with the best heuristic algorithms of the same type,our solution improves not only the problem solving scale,but also the solving speed.(3)We explore the method to improve unbalanced task execution efficiency by improving GPU pipeline execution model.Based on the GPGPU-Sim simulator,we observe the runtime metrics of unbalanced tasks,such as cache hit rate and pipeline idle cycle.We find the problem of pipeline periodic stall during the execution of unbalanced tasks on GPU,and reveal that the reason is compulsory miss in I-Cache of streaming multiprocessor.We classify 86 kernels in 31 programs in Rodinia,ispass-2009 and CUDA SDK according to capacity,analyze the relationship between kernel capacity and missing rate,design instruction cache prefetching mechanism suitable for the GPU execution model.The experiment shows that the prefetch mechanism can effectively reduce the long latency and pipeline stalls caused by the compulsory miss in L11,and obtain an average 12.17%performance improvement over the baseline model.Compared with the large cache scheme,our mechanism has the advantages of lower hardware cost and more beneficial programs.
Keywords/Search Tags:Irregular Task, CPU-GPU Heterogeneous Computing, Multi-seed Region Filling Algorithm, Parallel Genetic Algorith, Instruction Prefetch
PDF Full Text Request
Related items