Font Size: a A A

Research On Parallel Methods Using High-order Scheme On GPU

Posted on:2020-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:J LeiFull Text:PDF
GTID:2480306548493494Subject:Mechanics
Abstract/Summary:PDF Full Text Request
With the development of computational fluid dynamics(CFD),accurate and efficient numerical simulation of complex viscous problems is one of the important contents of investigations.Such complex problems often requires numerical simulations with high-order methods and high-fidelity physical models,such as large eddy simulation(LES)and direct numerical simulation(DNS)for turbulence.These methods demand a huge amount of computation,which poses a severe challenge to the computing efficiency.In recent years,as GPU(graphic processing units)and its programming languages develop rapidly,more and more researchers begin to apply the heterogenous architecture based on central processing units(CPU)with coprocessor GPU to improve the computing efficiency.Powerful floating-point computing ability and large memory-access bandwidth of GPU provide hardware support for accelerating intensive CFD applications.Based on the fifth-order WCNS scheme,the original CPU program is ported to GPU to improve the computing efficiency significantly.This paper focuses on the implementation process and thread configuration method of the program on GPU,investigates the optimization methods of parallelism,memory access and instruction execution,and summarizes the factors restricting the application of high-order scheme on GPU.The main work of this paper includes:1.According to the hardware architecture,programming method and memory model on GPU,the execution strategy of the program with high-order scheme is redesigned,to make it more suitable for parallel computing on GPU.2.Through the profiler,the performance bottleneck of kernels after modified on GPU is found and kernels are optimized one by one.The memories on chip of GPU with high bandwidth such as shared memory and register are made full use of,and the use of local memory with high latency is reduced as far as possible.And the program is optimized at instruction level to increase the utilization of hardware resources.At the same time,the bottleneck restricting the performance of program with high-order scheme is pointed out.3.The numerical results of one-dimensional Riemann problem,two-dimensional NACA0012 airfoil flow and nozzle flow,and three-dimensional Taylor-Green vortex problem are accurate by using the fifth-order WCNS scheme.Because the structure of one-dimensional problem is simple,the stride of global memory access is small and the boundary is easy to deal with,it obtains highest speedup of 87.7 with 4000 grid cells.In two-dimensional curvilinear grid,the speedup of convection term of NACA0012 airfoil is 132.1,and speedup of nozzle with a single block grid is 71.2,which is significantly higher than that of multi-block grid with the same mesh cells.When dealing with complex boundary in these problems,the performance bottleneck of the program is the data transfer between CPU and GPU.In the numerical simulation of the three-dimensional Taylor-Green vortex problem,the maximum speedup is 54.3 with the 221~3 grid cells,and the speedup improves with the increasing mesh size.Because of the stride of the global memory access,the computing efficiency of the kernels decreases in the?,?and?directions with the same mesh size in turn.
Keywords/Search Tags:High-order Scheme, GPU, CUDA, Computational Fluid Dynamics
PDF Full Text Request
Related items