Research On Performance Analysis And Optimization Of CPU-GPU Heterogeneous System

Posted on:2020-02-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y H Wang

Full Text:PDF

GTID:1488306353451164

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years,parallel computing and high performance computing develop rapidly.With a strong demand on processing ability of the processor by a large scale of computing and data intensive computing,CPU has already been unable to keep pace with it.The graphics processing unit(GPU)integrated with hundreds of thousands of transistors act as the processing core,and because of the rapid development of GPU general purpose computing(GPGPU),its powerful processing ability has more and more obvious advantages in dealing with large scale calculation.Therefore,the application scope of GPU is more and more widely.Heterogeneous system develops gradually,which makes the performance optimization of heterogeneous system a hot topic.However,the complexity and particularity of GPU structure brings great challenges to performance optimization of heterogeneous systems.Parallel computing uses GPGPU to improve the performance and efficiency of the system.At present,many researches focus on improving the performance of heterogeneous systems.In the paper,we focus on performance optimization of CPU-GPU heterogeneous systems.We make a thorough analysis of the related technology of GPGPU.We detail task allocation,communication model,workload scheduling,storage model of heterogeneous system and other research contents and methods based on GPU architecture.We conduct our research on heterogeneous system performance optimization method.The main research contents and the results are as follows:(1)Aiming at the problem of task allocation on CPU and GPU of CPU-GPU heterogeneous system,we propose and implement two-stage task allocation model.On the first stage which is called pre-treating,the support vector machine(SVM)is used to classify a task into CPU-kind and GPU-kind.So we obtain two task sets.On the second stage,with the two sets,we propose a task allocation model based on data dependence and a task allocation model based on minimzing the time gap.By the method of adjusting task sets which are allocated to CPU and GPU several times,the task allocation model based on data dependence can decrease the execution time to the maximum extent.After adjusting the allocation sets several times,the model carries out task allocation in the light of the characteristic and status of processors and the result produced by pre-treating on the first stage.Moreover,after conducting several benchmarks on a real heterogeneous system,the proposed model can improve the efficiency and throughput of heterogeneous system.The task allocation model based on minimizing the time gap estimates the execution time of CPU and GPU.After adjusting the two task sets produced by pre-treating,the model conducts task allocation in the case of time gap of the two task sets reaches minimum.The proposed model can achieve allocation efficiency improvement and heterogeneous system performance improvement substantially.In addition,the model has small allocation overload.(2)For parallel workloads with a large scale,the scheduling strategy can affect system performance seriously.To solve the problem of scheduling,we carry out scheduling of data transfer before workload execution scheduling,and propose an optimal scheduling algorithm for GPU workload.By the method of hiding data transfer into workload execution to the maximum extent,the algorithm can reduce the wait time.Then a small timespan can be achieved.We attribute the problem of hiding data transfer into workload execution to 0-1 knapsack problem,and propose the Pseudo-Polynomial Time Algorithm(PPTA)based on Dyer-Zemel algorithm.Then we deduce the Fully Polynomial-Time Algorithm Scheme(FPTAS)for PPTA.Our scheduling algorithm can estimate the optimal schedule sequence effectively for large scale workloads on GPU,so the idling time of processing cores is decreased effectively,As a result,the scheduling problem is well solved,which makes optimization for the system performance.(3)Multiple global memory access may lead to serious bottleneck in GPU kernels.Global memory access congestion brings low throughput as well as bad performance.In the paper,we make analysis of the crucial characteristics of global memory access:address distribution ratio of memory access,bandwidth utilization between global memory and Stream Multiprocessor(SM),ratio of coalesced memory access,ratio of computing instruction and memory instruction,ratio of read instruction and write instruction.Then we propose a global memory access congestion judging model based on these characteristics,which can make classification for the congestion degree of global memory access.After analyzing the congestion objects and choosing the access data,optimization is carried out by a grey target decision model based on cobweb area.So the congestion is relieved.Experimental results demonstrate that the proposed global memory congestion mitigation model in this paper can alleviate the access congestion to some extent,improve the access efficiency of the global memory,and thus improve the system performance.

Keywords/Search Tags:

GPU, heterogeneous system, task allocation, workload scheduling, data transfer, global memory, memory access congestion, performance optimization

PDF Full Text Request

Related items

1	Data-aware Task Scheduling On Heterogeneous Hybrid Memory Multiprocessor Systems
2	Research On The Design And Performance Optimization Of Memory System For Stream Architecture
3	Analysis And Visualization Of Memory Access Characteristics In Heterogeneous Memory
4	Optimization Of Task And Data Allocation Based On Variable Scratchpad Memory
5	Techniques To Tackle Memory Interference In Multicore Systems
6	A Heterogeneous Access-aware Scheduling Scheme For In-memory Stores
7	Theory And Method For Data Allocation And Task Scheduling On Multicore Systems
8	The Research And Implementation Of Support And Optimization Of Kylin Memory Management System For NUMA
9	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
10	Scheduling Research For Memory Controller Based On Density Of Memory Access