Font Size: a A A

Sparse Matrix Computation On Heterogeneous Architectures

Posted on:2014-02-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D ZouFull Text:PDF
GTID:1108330479979658Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sparse matrix computation on heterogeneous architectures has become one of the key issues of high-performance computing. Sparse matrix computation is the key technology and performance bottleneck in the process of simulation and calculation of many application areas of natural science and social science. In order to improve the performance of sparse matrix computation, the computational efficiency needs to be improved on specific platforms. However, on one hand, sparse matrix computation is related with memory access pattern and sparse pattern of matrix and is one of the most typical irregular algorithms, thus it is difficult to explore the time and space locality. On the other hand, as the heterogeneous architectures with accelerators become the mainstream of high-performance computer architecture, the factors of the parallel program execution efficiency become more complex and the traditional parallel algorithms and optimization methods have been unable to meet the needs of the heterogeneous architectures.In order to address these issues and challenges, this paper focus on sparse matrix computation on heterogeneous architectures. The main work and innovations are as follows:(1) A direction-optimized breadth-first search algorithm is proposed for CPU-GPU heterogeneous platform.This paper proposed a GPU-based bottom-up breadth-first search algorithm to improve the memory access continuity and reduces the load imbalance of GPU threads. The CPU-GPU collaborative bottom-up breadth-first search algorithm is proposed to take advantage of all computing resources of the CPU-GPU heterogeneous parallel computing platforms, and effectively improve the algorithm for large-scale node frontiers. On this basis, a direction-optimized breadth-first search algorithm is proposed, based on a combination of CPU-based top-down serial/parallel breadth-first search algorithm and CPUGPU collaborative bottom-up breadth-first search algorithm, improving the adaptivity of algorithm for node frontiers of different sizes.(2) An optimized sparse matrix-vector multiplication algorithm for the CPU-GPU heterogeneous platform is proposed.This paper proposed a block storage data structure for sparse matrix, based on the compression of index information by merging the index of adjacent nonzero elements.This method improved the regularity of sparse matrix memory access and controlled the additional computational and storage overhead introduced by the fill-in zero elements.A two-layer data structure is proposed to improve the adaptability for different sparse structure. On this basis, a work load balancing algorithm and optimized Sp MV algorithms designed for multi-core CPU and GPU are proposed, which effectively improve computational efficiency of the sparse matrix-vector multiplication algorithm. Finally a CPU-GPU collaborative sparse matrix-vector multiplication algorithm is proposed.(3) A sparse matrix factorization supernodal algorithms for the CPU-GPU heterogeneous platform is proposed.This paper studies the sparse Cholesky factorization algorithm. The sparse matrix supernodal data structure is optimized by merging and blocking to tune the granularity of computational tasks. The factorization process is mapped to a series of tasks on blocks. The task generation and scheduling algorithm is proposed, improving the parallelism while meets the data dependency. By mapping the control tasks and computing tasks to the CPU and the GPU respectively, the performance of sparse Cholesky factorization is improved effectively for CPU-GPU heterogeneous platform.(4) An optimized sparse triangular solving algorithm for the CPU-GPU heterogeneous platform is proposed.This paper proposed the block-oriented sparse storage strategy and decomposed the process of the sparse triangular solving algorithm according to the density of nonzero elements. On this basis, a load-balancing algorithm is designed to map the computation to threads. A warp-based strategy for the organization of GPU threads is designed to reduce the overhead introduced by the control branches. Finally a CPU-GPU collaborative sparse triangular solving algorithm is proposed to take advantage of all computing resources of the CPU-GPU heterogeneous parallel computing platforms.
Keywords/Search Tags:sparse matrix, heterogeneous architecture, parallel computing
PDF Full Text Request
Related items