Parallel performance studies for a linear parabolic test problem using the Intel Xeon Phi

Posted on:2017-08-15

Degree:M.S

Type:Thesis

University:University of Maryland, Baltimore County

Candidate:Day, Ryan D

Full Text:PDF

GTID:2458390005987260

Subject:Mathematics

Abstract/Summary:

The performance of parallel computer code depends on several factors including the system hardware, the numerical algorithm chosen, and how the algorithm is implemented. We consider parallel performance of a parabolic test problem on the CPUs of one and multiple nodes and using the Intel Xeon Phi in native and symmetric mode, with MPI only and with hybrid MPI+OpenMP programming models.;We report the performance of a classical parabolic test problem whose structure is representative of kernels of real-world application codes. This test problem is the linear heat equation with homogeneous Dirichlet boundary conditions in two spatial dimensions on the unit square, which can be approximated using backward Euler for the time derivative and centered finite difference approximation for the spatial derivatives in the diffusion term. The implementation of the conjugate gradient method for the iterative solution of this system at each time step has the potential to perform well up to many parallel processes. This test problem lies in complexity between linear stationary elliptic and non-linear transient parabolic problems. Analyzing its performance based on excellent results for the former problems will give guidance on the potential for good performance on the latter ones.;We report parallel performance studies for the 2013 portion of the maya cluster in the UMBC High Performance Computing Facility and the Stampede cluster in the Texas Advanced Computing Center. We conduct parallel performance studies with MPI and OpenMP on the CPUs only as well as using CPUs in combination with Intel Xeon Phi. The results show good performance using MPI on CPUs for up to 32 compute nodes. The results show code with a high degree of parallelism is required to take advantage of the many cores of the Phi and to achieve better performance than on CPUs and that for code with a sufficiently high degree of parallelism using both CPUs and Phis jointly on a hybrid node results in the best performance. The results show that code with smaller mesh resolutions is compute-bound and code with larger mesh resolutions is memory-bound.

Keywords/Search Tags:

Performance, Using the intel xeon phi, Parallel, Test problem, Results show, Mesh resolutions, Linear

Related items

1	Research On Video Algorithm Based On Intel Xeon Phi Many-core Architecture
2	Research On Graph Calculation Based On Xeon Phi Coprocessor
3	Research On Key Technologies For Realizing And Accelerating LARED-P On Intel Xeon Phi
4	Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators
5	Diffusion2D OpenMP Parallel Computing and Intel Xeon Phi Co-Processor Support Developmen
6	Reconfiguration and performance of distributed memory parallel systems
7	The Research And Development Of The Point Based Global Illumination Algorithm On The Intel MIC Architecture
8	Supporting Applications Involving Irregular Accesses and Recursive Control Flow on Emerging Parallel Environments
9	Research On Heterogeneous System Oriented Parallel Programming
10	Parallel Programming With Communication Efficiency On MIC-Enhanced Cluster