Font Size: a A A

Research On Workload Analysis And Optimizations On Heterogeneous Integrated Architectures

Posted on:2018-06-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:1368330566487975Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration.It is still an open problem to effectively leverage the advantages of both CPUs and GPUs on integrated architectures.The integration also brings forward several programming and system optimization challenges,especially for irregular applications.The complex interplay between heterogeneity and irregularity leads to very low processor utilization of running irregular applications on integrated architectures.In this dissertation,we focus on how to efficiently utilize the integrated architecture,especially for irregular applications.The main contributions of this dissertation are given as follows:(1)CoRunBench,a benchmark for understanding co-running on integrated architec-tures.We port 42 programs from Rodinia,Parboil,and Polybench benchmark suites,and analyze the co-running behaviors of these programs on both AMD and Intel integrated architectures.We find that co-running performance is not always better than running the program only with CPUs or GPUs.Among these programs,only 8 programs can benefit from the co-running,while 24 programs only using GPUs and 7 programs only using CPUs achieve the best performance.The remaining 3 programs show little performance preference for different devices.Through extensive workload characterization analysis,we find that architecture differences between CPUs and GPUs and limited shared memory bandwidth are two main factors affecting current co-running performance.(2)A tool for assisting users porting programs to integrated architectures.Since not all the programs can benefit from integrated architectures,we build an automatic decision-tree-based model to help application developers predict the co-running performance for a given CPU-only or GPU-only program.Results show that our model correctly predicts 14 programs out of 15 for evaluated programs.For a co-run friendly program,we further propose a profiling-based method to predict the optimal workload partition ratio between CPUs and GPUs.Results show that our model can achieve 87.7%of the optimal performance relative to the best partition.The co-running programs acquired with our method outperform the original CPU-only and GPU-only programs by 34.5%and 20.9%respectively.(3)FinePar,an irregularity-aware fine-grained workload partitioning method on in-tegrated architectures.FinePar considers architectural differences of the CPU and GPU and leverages fine-grained collaboration enabled by integrated architectures.Through irregularity-aware performance modeling and on-line auto-tuning,FinePar partitions ir-regular workloads and achieves both device-level and thread-level load balance.We eval-uate FinePar with 8 irregular applications on an AMD integrated architecture and compare it with state-of-the-art partitioning approaches.Results show that FinePar demonstrates better resource utilization and achieves an average of 1.38X speedup over the optimal coarse-grained partitioning method.
Keywords/Search Tags:Integrated Architecture, Co-Running, Irregular Application, GPU, CPU
PDF Full Text Request
Related items