Font Size: a A A

Research Of Fast Parallel Algorithm For Sparse Linear Systems On CPU+GPU Heterogeneous Platforms

Posted on:2018-07-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:W D YanFull Text:PDF
GTID:1310330542474510Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the extensive use of numerical simulation in scientific computing and engineering applications,many industries and fields have strong demands for the application and develop-ment of numerical simulation software.The solution of linear system is the core of numerical simulation.Due to the different characteristics of the linear system,the linear systems gener-ated in many fields,such as chemical process,heat conduction,circuit simulation and nuclear physics simulation,have the characteristics of sparse.However,with the complexity of the data grid and the irregularity of the boundary of the problem domain,the sparse patterns of the linear systems are sometimes more diverse.With the expansion of the scale of the problem and the refinement of the simulation,the order of the solution of the linear systems are getting higher and higher,and some even reach more than 10 billions,the performance of the solutions put forward higher requirements.But it is difficult to meet the performance requirements of solving such the high order sparse linear systems.At present,because of its economy and high efficiency,large scale heterogeneous comput-ing systems have gradually become important high-performance computing platforms,which are widely used in numerical simulation.For the diversity of high order sparse linear systems,a fast parallel algorithm for large scale sparse matrix compression and its core operation SpMV are studied on CPU+GPU heterogeneous platform,and for quasi diagonal and block-triangular sparse linear systems widely used in scientific computing and engineering applications,a hy-brid parallel algorithm with high performance and good robustness is proposed.According to the characteristics of the heterogeneous architecture,the heterogeneous parallelization strategy of the algorithm is studied from the aspects of data partition,task allocation and cooperative programming to give full play to the computational efficiency of heterogeneous computing sys-tems.This paper focuses on how to improve the performance of sparse linear systems on the popular CPU+GPU heterogeneous parallel computing platform.Firstly,a mathematical model which can describe and analyze the sparse feature of sparse matrix is constructed to compute SpMV,which is core operation for solving sparse linear equa-tions.The mathematical model can be used to analyze the computational performance of SpMV in the CPU+GPU heterogeneous parallel computing platform based on the hardware configu-ration.Based on the analysis and prediction of the computational performance of SpMV for different compression formats,an optimal compression format is selected to implement the Sp-MV computation on the CPU+GPU heterogeneous computing platform.The method proposed a solution to the hard problem which the efficiency of parallel computing of SpMV is not high because of the irregularity of computing data.Secondly,with the increase of the scale of the problem and the improvement of the pre-cision,the scale of the sparse matrix is becoming larger and larger.It is difficult to meet the performance requirements of SpMV by single node.It is especially necessary to divide the sparse matrix into blocks which are computed in parallel on multiple processors.Therefore,we propose a sparse matrix partitioning strategy for heterogeneous processor architectures,which can divide the sparse matrix into different blocks.These blocks are assigned to GPU and multi-core CPU to improve the computational performance of SpMV.This method can extract the dense data block to reduce the data filling based on the analysis of non-zero elements distribu-tion of sparse matrix,so it can adapt to different types of sparse matrix.Furthermore,the block strategy can improve the effect according to the computing power of heterogeneous processors,so as to realize the balance of computing tasks among heterogeneous processors and make full use of its computing resources.In addition,for a class of sparse linear systems-tridiagonal systems,The tridiagonal linear equations generated by the data grid with irregular boundaries are not standardized,and there are some discrete points beyond the diagonal leading to the performance degradation of the commonly used algorithm.We present a solving method which mixes direct and iterative meth-ods for the quasi-tridiagonal systems,and our method needs less storage space in a computing process.A quasi-tridiagonal matrix is split into a tridiagonal matrix and a sparse matrix using our method and then the tridiagonal equation can be solved by the direct methods in the iter-ation processes.Because the approximate solutions obtained by the direct methods are closer to the exact solutions,the convergence speed of solving the quasi-tridiagonal system of linear equations can be improved.Furthermore,we present an improved cyclic reduction algorithm using a partition strategy to solve tridiagonal equations on GPU,and the intermediate data in computing are stored in shared memory so as to significantly reduce the latency of memory access.The theoretical analysis and experimental results show that the algorithm is effective and scalable.In the end,we propose a hybrid block algorithm for another common block-tridiagonal systems,which can divided the block-tridiagonal matrix into different blocks to construct the parallel computing task sequence,and improve the efficiency of parallel computing by making full use of the parallel computing features of GPU thread grid.Our algorithm can not only improve the occupancy rate of GPU computing core,but also reduce the data access delay,and can solve the problem that the parallel efficiency of solving this kind of equations is very low.This algorithm can be applied to the resource scheduling of cloud computing center,which has a great effect on improving the performance of large-scale resource scheduling in cloud computing center.
Keywords/Search Tags:Sparse matrix, Linear system, SpMV, Solving algorithm, Heterogeneous platforms, Parallel computing
PDF Full Text Request
Related items