Research Of Parallel Computing On CPU/GPU Heterogeneous Architecture

Posted on:2013-06-08

Degree:Doctor

Type:Dissertation

Country:China

Candidate:F S Lu

Full Text:PDF

GTID:1268330392473879

Subject:Computer science and technology

Abstract/Summary:

PDF Full Text Request

Due to the combined effect of semiconductor technology, manufacturing processesand power consumption, the high performance computing (HPC) community has seendiverse processor architectures and many kinds of parallel computers. In the era ofgreen computing, the CPU/GPU heterogeneous HPC systems have a tradeoff among theversatility, performance and effectiveness, which means that they have a very promisingfuture. Large CPU/GPU heterogeneous systems have immense computing power, andprovide a good opportunity for large-scale scientific and engineering applications.However, the complex hardware structure and specific execution scheme present a hugeproblem for parallel computing researchers.The parallel computing involves many research topics. We only concentrate onthree of them, namely the parallel computation model, parallel programming model andparallel scalability model. Parallel computation model is an abstraction of theunderlying parallel computer systems and reflects their resources and performancecharacteristics with several parameters. It acts as a bridge between software andhardware for parallel algorithm designers. Parallel programming model is a collection ofprogram abstractions to provide a transparent computer software/hardware systemdiagram for parallel programmers. Parallel scalability model describes the scalability ofa parallel system when the system/problem size changes. The CPU/GPU heterogeneousHPC systems have specific structural characteristics and performance factors, whichcannot be accurately described by the existing models. Hence, there is an urgent need tocarry out parallel computing research for such HPC systems. It can provide support forthe current and the future parallel application development based on these platforms.In this thesis, we endeavor to address the issues of parallel computation model,parallel programming model and parallel scalability model for large-scale CPU/GPUheterogeneous HPC systems. The main contributions are listed as follows:(1) We perform a comprehensive and systematic survey of the related work for severalkey technologies in the parallel computing community. After discussing thecharacteristics and the future of parallel computer architectures, we survey therelated work for parallel computation model, parallel programming model andparallel scalability model, and perform a comparative analysis between severalinstances for each of the three models.(2) We propose a parallel computation model named HLognGPM for large-scaleCPU/GPU heterogeneous HPC systems. The model can effectively describe thecomputing power and various communication behaviors of the CPU/GPUheterogeneous systems. It includes six parameters, namely the latency, theoverhead, the message interval, the number of atomic communications, each byte interval and the processor performance. Note that decides thecomplexity and accuracy of the model. After analyzing the complexity ofHLognGPM, the simplified version HLog3GPM is mapped onto the TH-1A system.All the platform-specific parameters are determined for TH-1A. Extensiveexperiments with NPB-EP and NPB-CG show that the HLog3GPM model has thehighest prediction accuracy among the five parallel computation models for thecomparative study.(3) We propose a hybrid parallel programming model MPI+OpenMP/CUDA for theCPU/GPU heterogeneous HPC systems. Compared with the MPI+CUDA model, itcan fully exploit the enormous computing power of CPU/GPU heterogeneoussystems. The MPI component performs the inter-node message passing operations,and the OpenMP and CUDA components exploit the computing power ofmulti-core CPUs and many-core GPUs, respectively. Experimental results showthat the proposed hybrid model has a large performance advantages, especially forembarrassingly parallel applications.(4) We propose a collaboration-aware parallel scalability model to describe thescalability of parallel algorithm－GPU Cluster combination (－for short).While scaling the－combination, we maintain the ratio of parallelcomputing overhead/collaborative overhead unchanged, and investigate the effectof system scale, problem size and collaborative overhead on the scalability of－combination. Extensive experiments show that the model can well describethe scalability of－combination. The model can help developers to find abetter combination between parallel algorithms and GPU clusters, and predict theperformance of larger scale－combination with smaller ones.(5) We port the long-wave radiation scheme to large CPU/GPU heterogeneous HPCsystems, and obtain several guidelines for domain scientists to accelerate theirlegacy code. With almost the same accuracy, the GPU computing can significantlyenhance the efficiency of long-wave radiative transfer simulation. Numericalexperiments show that the hybrid RRTM has a good strong scalability andcollaboration-aware scalability.

Keywords/Search Tags:

CPU/GPU heterogeneous systems, Parallel computationmodel, Parallel programming model, Parallel scalability model, Long-waveradiation scheme

PDF Full Text Request

Related items

1	Study On Parallel Programming Models
2	Parallel Computing Scalability Studies And Applications On The Distributed Memory Environments
3	Research On Programming Model And Compiler Optimizations For CPU-GPU Heterogeneous Parallel Systems
4	Research On Programming Models And Optimizations For Petascale CPU-GPU Heterogeneous Computing Systems
5	Parallel Programming Model Of Heterogeneous GPU Cluster And Implementation
6	Pattern Of Parallel Programming Research
7	Research And Implementation Of Parallel Architectural Skeleton Based Parallel Programming Environment
8	Key Techniques Research On Unified Programming Environment For Heterogeneous Parallel Systems
9	Key Techniques Research On Multi-device Cooperative Parallel Computing For New-type Heterogeneous Many-core Systems
10	The Design And Implementation Of A Platform-independent Parallel Programming Model