Font Size: a A A

Efficient and portable parallel algorithms for Cholesky decomposition

Posted on:2004-12-08Degree:Ph.DType:Dissertation
University:Lehigh UniversityCandidate:Chu, Pei Yue LiuFull Text:PDF
GTID:1468390011976356Subject:Computer Science
Abstract/Summary:
In this dissertation we consider design issues such as data, layout, data, dependency, communication scheduling, and parallel models in order to develop parallel algorithms for Cholesky decomposition. Our intention is to design efficient and portable parallel algorithms for large scale dense linear systems to be run in the distributed memory environment.; To consider the issue of portability, we utilize the well known LogP model for designing parallel algorithms and for performing theoretical run time analysis. We developed a contention free communication scheduling for the parallel algorithms in order to reduce the communication cost. To study the impact of data layout on performance, we designed the parallel algorithms based on column, row and block data distribution utilizing the fan-in and fan-out methods. Specifically we introduced panel size for 1D layouts and granularity size for 2D layouts for the task partitioning. To determine the efficiency of the parallel algorithms, we derived lower bound results and theoretical run time performance. The analysis on the parallel algorithms shows that parallel algorithms for 1D data layouts are optimal. However for 2D data layouts, the parallel algorithms are efficient, if not necessarily asymptotically optimal.; We then implemented the parallel algorithms using MPI. The experimental results show that fan-in outperforms fan-out for 2D layouts, especially for larger processor size. The performance for the 1D fan-out row-based parallel algorithm was no worse than that of the column-based parallel algorithms. This seems to refute the common notion that row-based parallel algorithms are not, as efficient as column-based parallel algorithms. The experimental results indicated that performance was impacted as panel size and/or granularity size varied. This provides insights that the constant factors for lower bound and run time performance call not be ignored.
Keywords/Search Tags:Parallel, Run time, Data, Efficient, Performance, Size
Related items