Font Size: a A A

Parallel Programming Model Of Heterogeneous GPU Cluster And Implementation

Posted on:2014-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2298330434972515Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Because of powerful computing power, supporting large-scale data-level parallelism and high-speed memory access bandwidth, etc., GPU clusters has become the mainstream of modern high-performance computing technology and research focus. However GPU brought to the cluster structure of a complex system, not only render the node level, isomerization, the isomerization node also show, that the cluster parallel computing capability into a multi-level, different types of features, including nodes, internal node between CPU and GPU, GPU data within the parallel programming greatly increased the complexity of deployment and operation.There is no ready-made programming model is completely respond to this architecture, the mainstream heterogeneous GPU cluster systems are mostly used for GPU programming model for heterogeneous computing and oriented message passing model of distributed memory (MPI) loosely coupled way, the programer will describe the application as a series of MPI processes interact with each other, and deploy to all nodes run. If MPI processes is a serial process, then the node running on the CPU, and if MPI process contains parallel processing of data, then run on the node that contains the GPU. But this approach lack of standard to divide the application and the number of parallel processes to be divided, it is difficult to ensure the sub-task containing parallel processing is assigned to the node containing a GPU meet the computing power of data parallel computing process. Meanwhile, MPI+CUDA has itself is not a sufficient theoretical programming model and have no analytical. Performed different with serial, parallel and distributed execution is an uncertain execution, so how to program in a variety of circumstances to ensure the correct behavior does not occur in particular, such as deadlocks、live locks and other phenomena can be programmed with the analytical model, and MPI+CUDA is not available. Taking advantage of GPU clusters in multiple levels of parallel computing capability, since the complexity of asynchronous control, MPI+CUDA processes via message to synchronization control, which could cause a lot of waiting, seriously affect performance; while asynchronous control requires the programmer to manually achieve inter-process data transfer buffer and complex context, which makes programming very difficult. To solve the above problems, we design and implementation the new programming framework DISPAR based on DFG could fit GPU heterogeneous clusters architecture. The basic idea is to describe the application based on data flow model, to guide the programmer with data stream as the core to divide the application into a series of work in an asynchronous VNODE, constitutes a data-parallel subtasks (process) to reflect the GPU parallel computing capabilities. DFG is one of the mainstream models used to describe for data-intensive applications, and with hierarchical description ability, therefore suitable for describing complex applications. Compared to other computing model, DFG in the most natural way to explicitly express application parallelism and data-centric applications described, is widely recognized as the most able to reflect the application of the calculation model of data parallelism. In addition, VNODE is data-driven control method to achieve the greatest degree of concurrent asynchronous operation and reduce the performance impact of synchronous operation, each node runs asynchronously, abandon global control and minimizing synchronization requirements. DISPAR programming framework replaces the MPI programming model of random, blind explicit process of division, automatically generated MPI+CUDA platform process and the correlation between running a small, bringing portability, scalability, and other advantages.By using the method of extension language to achieve the DISPAR programming model, through a simple two component VNODE and PIPE to describe the application of system-level structure, become an extension of C language. DISPAR programming model of the sub-tasks run asynchronously, standardized packaging for basic operation, asynchronous control makes programming very easy to implement in the context of asynchronous control, buffering and other effective package, makes programming easy and with the conmpact application. To achieve this new language and avoid the re-design of compiler, a source code to source code conversion method is proposed in this paper, which is StreamCC pre-processor and to be effective in a DISPAR program to certain specifications MPI+CUDA program and lay the foundation for effectively deploying and running asynchronously in the GPU cluster.Practical application shows that DISPAR programming model to achieve good results.
Keywords/Search Tags:Heterogeneous GPU Clusters, Parallel Computing, Programming model, Preprocessor
PDF Full Text Request
Related items