Font Size: a A A

Research Of Communication Interface For Parallel Programming On GPU Cluster

Posted on:2013-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:J X CengFull Text:PDF
GTID:2248330392457823Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
GPU is very suitable for processing large-scale intensive data and parallel data. AndCUDA (Compute Unified Device Architecture) makes GPU more extensive used in thearea of general purpose computing. Due to its high cost performance, GPU cluster iswidespread for high performance computing. However, there is no a standard parallelprogramming model for GPU cluster. Most applications are programmed with thecombination of CUDA and MPI. They are both difficult to program because of the hightechnical requirements of programmers. Programmers must be familiar with not only GPUand CUDA programming, but also cluster computing and message passing. During theprogramming, the programmers will control the communication between the memories inhost and device, and the memories in different nodes explicitly. Therefore, it is a difficultand hard work for programmers.CUDAGA, a communication interface on GPU cluster, includes the feature of GA (ashared memory programming model on distributed memory) and CUDA. CUDAGAprovides GPU-to-GPU communication interfaces through virtual global shared addressspace, which can maintain data consistency between global arrays on CPU and GPU. Itcan help users choose GPU device correctly in multi-process and multi-GPU environment.And it also can help users monitor GPU cluster by providing information inquiry functionsand graphical monitoring interfaces. Moreover, CUDAGA improves the performance oflinear algebra functions of GA by optimizing data transferring and computing kernel.Accelerated functions can be called by users directly. CUDAGA provides a portablecommunication interface on GPU cluster for parallel programming, which can simplifyprogramming while ensure performance, so as to improve the programming efficiency onGPU cluster.Test of CUDAGA is taken on parallel matrix multiplication algorithm, called Cannon,and Jacobi iteration algorithm. The results show that CUDAGA model is suitable forparallel programming of the array operations on basic data structure with massive datacommunication and massive read operation. Compared to CUDA+MPI, the code length ofCUDAGA can be reduced more than half with better performance and efficiency.
Keywords/Search Tags:GPU Cluster, Parallel Programming, Cluster Communication, GlobalArrays
PDF Full Text Request
Related items