Research On Key Technologies For Realizing And Accelerating LARED-P On Intel Xeon Phi

Posted on:2014-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:W K Yao

Full Text:PDF

GTID:2308330479479128

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

LARED-P is an important particle simulation algorithm which implements three-dimensional laser-plasma simulation using particle-in-cell method. It has great significance in deeply understanding the propagation characteristics of intense laser in low-density plasma and the complex process of interaction between them which help us greatly in research of controlled nuclear fusion. In practical applications, LARED-P procedures the problem in large scale, and needs the time step a little more, resulting in the required calculation time is very long, it takes several days. Currently, accelerating the typical time-consuming application by using the accerlerat components in a heterogeneous system is significant for the computational scientists.Intel’s new many-integrated-core(MIC) products-Intel Xeon Phi, with a wide vector processor and multi-core features in architecture that can effectively accelerate the normal calculation application. Research on accelerating LARED-P by the use of the critical accelerating technology of Intel Xeon Phi has important significance for improving the particle simulation computation time, reduce costs and improve the simulation efficiency simulation.The Intel Xeon Phi provides two main operating modes i.e. Native Mode and Offload mode. We focused on accelerating the two major parts of LARED-P, i.e. particle motion equation and the particle-cloud-equation, in both Native mode and Offload mode. We made a change to some features in LARED-P program which are not conducive to parallel computing, got the acceleration of LARED-P on the Intel Xeon Phi, and achieved good results. Our main tasks are as follows:(1) We have proposed multiple key technologies to accelerate the LARED-P program in Native mode. We have improved the multi-threaded parallelization of LARED-P. In order to eliminate the dependence between the index of particles and the index of grid in parallelization, traditional approach is to build an array to record the first particle index for each grid. Since each thread handles a lot of grids, we selected index for each thread instead of indexing each grid, which reduce the size of index array and save time. For particles in constant motion, which cause the load imbalance between the multi-threads, we proposed load balancing algorithm based on particles which improved the load imbalance between threads.(2) Since the poor memory access locality of LARED-P we choose the MIC–oriented loop tiling and prefetching techniques for memory access optimization, and finally improved the memory access locality of LARED-P program on the MIC.(3) We have built a guideline for the use of vectorization and utilized this guideline to achieve the vectorization of LARED-P algorithm on the MIC and make a full use of the wide vector. Using the above multiple optimize key technology to achieve a LARED-P operating mode in the MIC Native acceleration obtained relative to the CPU 2.4 times speedup.(4) We have proposed the dynamic load partition of LARED-P program in offload mode and double buffering algorithms. CPU is idle when Intel Xeon Phi calculates in traditional Offload mode, wasting CPU computing power. This paper designs and implements a system which can adjust the load dynamically in Offload mode, to achieve a dynamic equilibrium division of the load between CPU and Intel Xeon Phi. Meanwhile, we used a balanced space and load regulation ladder array to reduce the impact on the load division bring by the performance fluctuations of hardware. What’s more, these optimizations accelerate the convergence speed of load balancing.(5) In order to reduce the communication overhead between the CPU and MIC in offload mode, we used the redundant transmission and intermediate arrays and proposed the double buffering technology which made a breakthrough of the data dependencies in LARED-P program, realized the overlap of the data communication and computation, hidden transmission delay. These two optimizations ultimately makes LARED-P program obtained 49.4% performance boost in Offload and 2.9 times speedup compare to the CPU.

Keywords/Search Tags:

Intel Xeon Phi, Particle Simulation, LARED-P, Accelerating, Native Mode, Offload Mode

PDF Full Text Request

Related items

1	Research On Video Algorithm Based On Intel Xeon Phi Many-core Architecture
2	Accelerating Pattern Matching in Neuromorphic Text Recognition System Using Intel Xeon Phi Coprocessor
3	Parallel performance studies for a linear parabolic test problem using the Intel Xeon Phi
4	Accelerating Monte Carlo Simulation Of Neutron Transport On The Intel MIC
5	Research On Graph Calculation Based On Xeon Phi Coprocessor
6	Porting and Tuning Numerical Kernels in Real-World Applications to Many-Core Intel Xeon Phi Accelerators
7	The Research And Development Of The Point Based Global Illumination Algorithm On The Intel MIC Architecture
8	The Research Of Decreasing Differential Mode Group Delay Utilizing Strong Mode Coupling In Few Mode Fiber
9	Theoretical And Technical Research About Multi-Mode Transmission Based On Mode Multiplexing
10	Diffusion2D OpenMP Parallel Computing and Intel Xeon Phi Co-Processor Support Developmen