Font Size: a A A

Research On Heterogeneous Many-core Platform Tianhe ? For CFD Simulation And Performance Optimization Technology

Posted on:2015-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z F SongFull Text:PDF
GTID:2348330509960905Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Using speed coprocessors or accelerators to work with the main processor, is becoming a new trend in the development of the current high-performance computer architectures. For example, a graphics processor(Graphic Processing Unit, referred to as the GPU) as accelerators, or in Intel's Many Integrated Core(Many Integrated Core, called MIC) coprocessor architecture for mixed isomers in high-end computers become more mainstream, with the first half of 2014 the world's top 500 supercomputers(TOP500) as an example, there are four former 10 units, before 15 in seven sets are mixed isomers architecture. How will the application of existing fields seamless migration to these mixed heterogeneous computing platforms, will increasingly become high-performance computing applications challenging issues faced by developers. In this paper, a typical computational fluid dynamics(CFD) application problems as the starting point, and analyzes its memory access computing features, focusing on exploring the issue in such applications as represented by the Milky Way II CPU + MIC mixing heterogeneous computing collaboration platform parallel and performance optimization methods in order to provide a method of accumulation and technology reference for other similar large-scale heterogeneous parallel collaborative applications.Given the current CFD simulations include two main methods Navier-Stokes equations to solve discrete lattice Boltzmann equation solving, etc. This article features first calculated starting from the application of the analysis, the results indicate that these two methods are calculated low-density visit memory-constrained applications; Secondly, as represented by CFD memory access restricted class of applications on heterogeneous platforms CPU + MIC mixed performance transplant carried out a detailed study and exploration. Because solving real CFD applications have more complex physical processes and computational processes, we first have to fetch an equally limited features simple model(force guided algorithm SORGRAD) as an example, explores the pure CPU platform and pure MIC platform Both on the same platform configuration and optimization methods to accelerate, then NS equation solving discrete applications NPB BT-MZ and lattice Boltzmann equation solving method(LBM) applications Open LBMflow example, were applied to study the types of CFD solver heterogeneous platforms in the Milky Way II synergistic parallel migration and performance optimization methods.We follow a different focus in parallel with the optimization techniques, and environmental platform, the research organization into two parts: one for the same configuration(ie pure CPU or pure MIC) many-core platforms typical parallel CFD applications and optimization, focusing on the performance of the model, application features and efficient parallel manycore platforms and performance optimization techniques; CFD numerical simulation for heterogeneous environments Tianhe CPU + MIC mixed, heterogeneous environments focus on the unique synergy parallel and optimization methods. Specific research work with major innovations are as follows:(A) Typical parallel CFD applications optimized for the Milky Way and all aspects of nuclear isomorphic platform:(1) Based on rooflin performance model to calculate the density of metrics to analyze the typical features of CFD solution process of the program, select a breakthrough parallel transplant and subsequent performance optimization to provide a theoretical foundation and basis for decision making. The results show that usually belong to traditional CFD solution method to calculate the density of smaller "fetch-limited" application, which indicates that during the parallel migration and performance optimization, memory access performance optimization will be the goal of primary concern.(2) To force the issue guidance algorithm(SORGRAD) for example, CPU-based platforms and MIC isomorphic isomorphic platform, proposed and implemented to accelerate and optimize methods of data-level and instruction-level parallelism levels. Data-level parallelism using Open MP multithreading, instruction-level parallelism is the core module for the algorithm uses a single instruction multiple data(SIMD) to achieve quantified; When porting to the MIC platform, focusing on the use of a wider analysis of the test vector instructions parallel effects. Numerical test results show that in native mode MIC program in parallel computing, when the data size is greater than 8704, a parallel program with respect to the serial program to enhance the performance of the highest at about 600 times. Guidance on the issue gained force fetch-limited applications in parallel with the optimization experience, can be extended to more complex physical processes CFD application problems go.(3) For the use of the lattice Boltzmann method(LBM) class of CFD applications to solve the problem, using a task-level, data sets, and instruction-level parallelism three strategies proposed and implemented MPI + Open MP hybrid parallel method, the results show, LBM application problems good strong scalability and weak scalability on multithreaded CPU performance can be optimized to improve about 14 times. By single-core optimization, the data size is 512 * 256 * 256(unless otherwise specified test size is 512 * 256 * 256) serial program performance on up to 2.97 times; optimization through multithreaded program performance 14 around times; parallel through cross-node optimized for MPI communication order was re-ordering, the results show a large-scale parallel computing LBM has good scalability and weak and strong scalability; through instruction-level SIMD optimization, making the memory access order and calculation sequence matches, effectively improve the computing / memory access ratio.(4) For NS discrete solving(NPB BT-MZ) class CFD application problems, their parallel algorithms on explored to achieve the concurrent computing viscous terms of the problem with non-sticky items, as well as different dimensional direction parallel computing viscous term. Analysis and testing to verify, verify the correctness, performance test results show that the parallel method, a new parallel algorithm can improve the performance of 2.8 times.(B) For the CFD application CPU + MIC mixing heterogeneous environments Tianhe collaborative parallel:(1)Calculated for the application of the hybrid LBM heterogeneous platforms, proposed collaborative offload parallel computing method based on asynchronous transmission. The results show that the parallel implementation of good communication time CPU and MIC were hidden on a single node to obtain a more accelerated 69.24 times the CPU serial program performance; large-scale test results on the Milky Way II showed that the cooperative parallel method has good weak scalability.(2) BT-MZ computing applications on heterogeneous platforms for mixing proposed parallel computing method based on the nested threads, combined pipeline parallelism of thought, realized the CPU + MIC heterogeneous parallel performance than pure CPU performance acceleration 2.14 times the performance.
Keywords/Search Tags:Limited memory access, CFD, Force guiding algorithm, LBM, NPBMZ, Paralleloptimization, CPU+MIC
PDF Full Text Request
Related items