Font Size: a A A

Research On Key Technologies For Realizing And Accelerating LARED-P On Intel Xeon Phi

Posted on:2014-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:W K YaoFull Text:PDF
GTID:2308330479479128Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
LARED-P is an important particle simulation algorithm which implements three-dimensional laser-plasma simulation using particle-in-cell method. It has great significance in deeply understanding the propagation characteristics of intense laser in low-density plasma and the complex process of interaction between them which help us greatly in research of controlled nuclear fusion. In practical applications, LARED-P procedures the problem in large scale, and needs the time step a little more, resulting in the required calculation time is very long, it takes several days. Currently, accelerating the typical time-consuming application by using the accerlerat components in a heterogeneous system is significant for the computational scientists.Intel’s new many-integrated-core(MIC) products-Intel Xeon Phi, with a wide vector processor and multi-core features in architecture that can effectively accelerate the normal calculation application. Research on accelerating LARED-P by the use of the critical accelerating technology of Intel Xeon Phi has important significance for improving the particle simulation computation time, reduce costs and improve the simulation efficiency simulation.The Intel Xeon Phi provides two main operating modes i.e. Native Mode and Offload mode. We focused on accelerating the two major parts of LARED-P, i.e. particle motion equation and the particle-cloud-equation, in both Native mode and Offload mode. We made a change to some features in LARED-P program which are not conducive to parallel computing, got the acceleration of LARED-P on the Intel Xeon Phi, and achieved good results. Our main tasks are as follows:(1) We have proposed multiple key technologies to accelerate the LARED-P program in Native mode. We have improved the multi-threaded parallelization of LARED-P. In order to eliminate the dependence between the index of particles and the index of grid in parallelization, traditional approach is to build an array to record the first particle index for each grid. Since each thread handles a lot of grids, we selected index for each thread instead of indexing each grid, which reduce the size of index array and save time. For particles in constant motion, which cause the load imbalance between the multi-threads, we proposed load balancing algorithm based on particles which improved the load imbalance between threads.(2) Since the poor memory access locality of LARED-P we choose the MIC–oriented loop tiling and prefetching techniques for memory access optimization, and finally improved the memory access locality of LARED-P program on the MIC.(3) We have built a guideline for the use of vectorization and utilized this guideline to achieve the vectorization of LARED-P algorithm on the MIC and make a full use of the wide vector. Using the above multiple optimize key technology to achieve a LARED-P operating mode in the MIC Native acceleration obtained relative to the CPU 2.4 times speedup.(4) We have proposed the dynamic load partition of LARED-P program in offload mode and double buffering algorithms. CPU is idle when Intel Xeon Phi calculates in traditional Offload mode, wasting CPU computing power. This paper designs and implements a system which can adjust the load dynamically in Offload mode, to achieve a dynamic equilibrium division of the load between CPU and Intel Xeon Phi. Meanwhile, we used a balanced space and load regulation ladder array to reduce the impact on the load division bring by the performance fluctuations of hardware. What’s more, these optimizations accelerate the convergence speed of load balancing.(5) In order to reduce the communication overhead between the CPU and MIC in offload mode, we used the redundant transmission and intermediate arrays and proposed the double buffering technology which made a breakthrough of the data dependencies in LARED-P program, realized the overlap of the data communication and computation, hidden transmission delay. These two optimizations ultimately makes LARED-P program obtained 49.4% performance boost in Offload and 2.9 times speedup compare to the CPU.
Keywords/Search Tags:Intel Xeon Phi, Particle Simulation, LARED-P, Accelerating, Native Mode, Offload Mode
PDF Full Text Request
Related items