Font Size: a A A

Parallel performance studies for a linear parabolic test problem using the Intel Xeon Phi

Posted on:2017-08-15Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Day, Ryan DFull Text:PDF
GTID:2458390005987260Subject:Mathematics
Abstract/Summary:
The performance of parallel computer code depends on several factors including the system hardware, the numerical algorithm chosen, and how the algorithm is implemented. We consider parallel performance of a parabolic test problem on the CPUs of one and multiple nodes and using the Intel Xeon Phi in native and symmetric mode, with MPI only and with hybrid MPI+OpenMP programming models.;We report the performance of a classical parabolic test problem whose structure is representative of kernels of real-world application codes. This test problem is the linear heat equation with homogeneous Dirichlet boundary conditions in two spatial dimensions on the unit square, which can be approximated using backward Euler for the time derivative and centered finite difference approximation for the spatial derivatives in the diffusion term. The implementation of the conjugate gradient method for the iterative solution of this system at each time step has the potential to perform well up to many parallel processes. This test problem lies in complexity between linear stationary elliptic and non-linear transient parabolic problems. Analyzing its performance based on excellent results for the former problems will give guidance on the potential for good performance on the latter ones.;We report parallel performance studies for the 2013 portion of the maya cluster in the UMBC High Performance Computing Facility and the Stampede cluster in the Texas Advanced Computing Center. We conduct parallel performance studies with MPI and OpenMP on the CPUs only as well as using CPUs in combination with Intel Xeon Phi. The results show good performance using MPI on CPUs for up to 32 compute nodes. The results show code with a high degree of parallelism is required to take advantage of the many cores of the Phi and to achieve better performance than on CPUs and that for code with a sufficiently high degree of parallelism using both CPUs and Phis jointly on a hybrid node results in the best performance. The results show that code with smaller mesh resolutions is compute-bound and code with larger mesh resolutions is memory-bound.
Keywords/Search Tags:Performance, Using the intel xeon phi, Parallel, Test problem, Results show, Mesh resolutions, Linear
Related items