Communication-efficient parallel sparse Cholesky factorization

Posted on:1996-04-06

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Eswar, Kalluri

Full Text:PDF

GTID:1468390014486793

Subject:Computer Science

Abstract/Summary:

Performing the Cholesky factorization of large sparse matrices is an important computational component of several scientific and engineering applications. Using parallel computers to carry out such computations is becoming increasingly common and necessary for good performance. Determining the Cholesky factor for sparse matrices is an irregular computation when the sparsity structure is unstructured. Mapping the sparse matrix elements and the operations constituting the factorization computation onto the processors so as to achieve low interprocessor communication is an important problem to be addressed. It is also necessary to organize the local computation on each processor so that it exhibits good data cache usage behavior.; This dissertation considers the sparse Cholesky factorization mapping problem at various levels of granularity of the matrix unit that is mapped. At the column level, a heuristic technique called recursive partitioning for mapping columns of the matrix to processors, is shown to result in significantly lower communication compared to a cyclic mapping. The technique is applied to various methods for organizing the factorization computation. At the supernodal level, a new algorithm that simultaneously achieves the goals of reducing interprocessor communication and reducing processor idling is presented. This algorithm also results in good data cache usage on each processor. A sophisticated variant of this algorithm that adapts to the amount of working storage available on each processor is also developed. At the block level, new mapping strategies that integrate techniques used for reducing communication in dense factorization and in sparse factorization are presented. These mapping strategies use a three-dimensional logical view of the processor system. A common framework for specifying these and previously proposed mapping strategies is developed. Practical implementations of that new mapping strategies using the BLAS (Basic Linear Algebra Subprograms) are described.; Experimental performance results for the various proposed algorithms are presented using sample sparse matrices drawn primarily from the Harwell-Boeing collection. The parallel machine used is a 32-processor Cray T3D. The results are used to demonstrate the efficacy of that techniques developed here, and to bring out several interesting tradeoffs between various factors affecting the performance of the factorization computation.

Keywords/Search Tags:

Factorization, Sparse, Cholesky, Computation, Communication, Mapping strategies, Parallel

Related items

1	Research On Cholesky Factorization Based Multiple Hidden Layers Extreme Learning Machine Algorithms
2	Stable Sparse Orthogonal Factorization of Ill-Conditioned Banded Matrices for Parallel Computin
3	On Fast Computation Of Bilinear Pairings
4	Study Of Parallel Particle Swarm Optimization
5	Research On Sparse Algorithms And Non-convex Optimization Problems In Machine Learning
6	Efficient Algorithms For Image Restoration And Parallel Processing
7	Sparse Matrix Factorization Based On FPGA
8	Research Of Parallel Distributed Evolutionary Computation Based On Internet
9	Research On Methods For GPU Based Parallel Acceleration Of Matrix Computation
10	The Parallel Environment Construction Of SGI Server And The Management Of The Parallel Tasks