Font Size: a A A

Communication-efficient parallel sparse Cholesky factorization

Posted on:1996-04-06Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Eswar, KalluriFull Text:PDF
GTID:1468390014486793Subject:Computer Science
Abstract/Summary:
Performing the Cholesky factorization of large sparse matrices is an important computational component of several scientific and engineering applications. Using parallel computers to carry out such computations is becoming increasingly common and necessary for good performance. Determining the Cholesky factor for sparse matrices is an irregular computation when the sparsity structure is unstructured. Mapping the sparse matrix elements and the operations constituting the factorization computation onto the processors so as to achieve low interprocessor communication is an important problem to be addressed. It is also necessary to organize the local computation on each processor so that it exhibits good data cache usage behavior.; This dissertation considers the sparse Cholesky factorization mapping problem at various levels of granularity of the matrix unit that is mapped. At the column level, a heuristic technique called recursive partitioning for mapping columns of the matrix to processors, is shown to result in significantly lower communication compared to a cyclic mapping. The technique is applied to various methods for organizing the factorization computation. At the supernodal level, a new algorithm that simultaneously achieves the goals of reducing interprocessor communication and reducing processor idling is presented. This algorithm also results in good data cache usage on each processor. A sophisticated variant of this algorithm that adapts to the amount of working storage available on each processor is also developed. At the block level, new mapping strategies that integrate techniques used for reducing communication in dense factorization and in sparse factorization are presented. These mapping strategies use a three-dimensional logical view of the processor system. A common framework for specifying these and previously proposed mapping strategies is developed. Practical implementations of that new mapping strategies using the BLAS (Basic Linear Algebra Subprograms) are described.; Experimental performance results for the various proposed algorithms are presented using sample sparse matrices drawn primarily from the Harwell-Boeing collection. The parallel machine used is a 32-processor Cray T3D. The results are used to demonstrate the efficacy of that techniques developed here, and to bring out several interesting tradeoffs between various factors affecting the performance of the factorization computation.
Keywords/Search Tags:Factorization, Sparse, Cholesky, Computation, Communication, Mapping strategies, Parallel
Related items