Font Size: a A A

Research On Large-scale CPU And MIC Heterogeneous Parallel Computing Techniques For Detonation Simulation

Posted on:2018-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:T T WangFull Text:PDF
GTID:2428330569998836Subject:Software engineering
Abstract/Summary:PDF Full Text Request
AMROC(the Adaptive Mesh Refinement in Object-oriented c++)is an open source block Adaptive grid Refinement framework.It can be used in the simulation calculation of multiphase flow,combustion and fluid-structure interaction problems,etc.Initition3 d is a jet detonation simulation software based on AMROC developed by the National University of Defense Technology.It can be used to study the jet detonation initiation and transmission in the supersonic flow.Initition3 d involves a large amount of calculations and its computation cost is very high.To boost the computational performance of Initition3 d,this paper carries out research parallel computing techniques targeting the heterogenous platfirm based on multicore CPU and Intel Many Integrated Core(MIC).The main efforts and results include the following:(1)The structure and algorithm of the AMROC framework are analyzed,and the performance test and analysis of the thermal injection detonation initiation propagation based on the AMROC is carried out.The performance test and analysis of the Initition3 d are carried out,including the MPI parallel performance test,the total runtime distribution on each part,and the MPI communication performance test.The results show that with the increase of the number of processes,the AMR algorithm used at the bottom of the software will incure serious MPI communication overhead,resulting in a gradual decrease in parallel efficiency.It was also found that the computation workloads and the MPI communication workloads are not balanced among different processes.(2)Initition3d is origninally parallelized in pure MPI.This paper adds OpenMP parallelization to Initition3 d and optimized the performance.The OpenMP parallel scalability evaluation and hybrid MPI / OpenMP parallel performance evaluation are conducted.The performance evaluation results show that when the number of threads is 16,the achieved OpenMP parallel efficiency is still above 60%.Using 2 MPI process * 12 OpenMP threads can maximize the achieved performance on one node,when the achieved performance is 1.83 times of the original MPI only version.Then this paper analyzes the reasons for the decrease of OpenMP parallel efficiency through the test of the OpenMP library overhead,the load imbalancing between threads and the serial portion cost.The results reveal that the OpenMP threads creation and destruction will incure a lot of overhead,leading to the long execution time of the main thread,the unbalanced load between threads,thus reducing the OpenMP parallel efficiency.(3)This paper analyzes the difficulty of MIC porting based on the characteristics of the application code.Then it implements a preliminary heterogenous version of the kernel parts of Initition3 d based on the OpenMP 4.0 standard.The compiler optimizations,vectorization optimizations,OpenMP thread optimizations,data transfer optimizations between CPU and MIC are also performed.The results show that these parts can achieve good OpenMP parallel speedup on the MIC coprocessor.Finally,it uses Intel Vtune to analyze the performance.It shows that three main factors hinder the performance on the MIC coprocessor:(1)the single thread performance on the MIC coprocessor is far below that is achieved on a single Xeon CPU core;(2)Initition3d's code feature makes it ifficult to make good use of the MIC coprocessor;(3)the OpenMP library overhead is rather large,which affects the OpenMP parallel scalability.
Keywords/Search Tags:Jet detonation simulation, parallel algorithms, heterogeneous programming, performance optimization, parallel efficiency
PDF Full Text Request
Related items