Research Of Communication Interface For Parallel Programming On GPU Cluster

Posted on:2013-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:J X Ceng

Full Text:PDF

GTID:2248330392457823

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

GPU is very suitable for processing large-scale intensive data and parallel data. AndCUDA (Compute Unified Device Architecture) makes GPU more extensive used in thearea of general purpose computing. Due to its high cost performance, GPU cluster iswidespread for high performance computing. However, there is no a standard parallelprogramming model for GPU cluster. Most applications are programmed with thecombination of CUDA and MPI. They are both difficult to program because of the hightechnical requirements of programmers. Programmers must be familiar with not only GPUand CUDA programming, but also cluster computing and message passing. During theprogramming, the programmers will control the communication between the memories inhost and device, and the memories in different nodes explicitly. Therefore, it is a difficultand hard work for programmers.CUDAGA, a communication interface on GPU cluster, includes the feature of GA (ashared memory programming model on distributed memory) and CUDA. CUDAGAprovides GPU-to-GPU communication interfaces through virtual global shared addressspace, which can maintain data consistency between global arrays on CPU and GPU. Itcan help users choose GPU device correctly in multi-process and multi-GPU environment.And it also can help users monitor GPU cluster by providing information inquiry functionsand graphical monitoring interfaces. Moreover, CUDAGA improves the performance oflinear algebra functions of GA by optimizing data transferring and computing kernel.Accelerated functions can be called by users directly. CUDAGA provides a portablecommunication interface on GPU cluster for parallel programming, which can simplifyprogramming while ensure performance, so as to improve the programming efficiency onGPU cluster.Test of CUDAGA is taken on parallel matrix multiplication algorithm, called Cannon,and Jacobi iteration algorithm. The results show that CUDAGA model is suitable forparallel programming of the array operations on basic data structure with massive datacommunication and massive read operation. Compared to CUDA+MPI, the code length ofCUDAGA can be reduced more than half with better performance and efficiency.

Keywords/Search Tags:

GPU Cluster, Parallel Programming, Cluster Communication, GlobalArrays

PDF Full Text Request

Related items

1	Research Of Parallel Communication Technique On Cluster Network
2	Parallel Design And Implementation Of FFT Algorithm Based On MPI And Linux Cluster Environment
3	Research Of Parallel Communication Protocol Based On Intelligence NICs
4	Based On Application And Optimization Of SMP Cluster Parallel Programming Model
5	Parallel Programming With Communication Efficiency On MIC-Enhanced Cluster
6	Study On Hybrid Parallel Molecular Dynamics Computing On Multicore Cluster
7	Research Of Multi-level Parallelism Programming Pattern For Hybrid Parallel Computing Environment
8	A Study Of MPI And OpenMP Parallel Programming Based On SMP Cluster
9	Research On The Optimitation Method Of Group Communication On Cluster System
10	A Study Of Efficient Parallel FDTD Methods On Cluster Systems