Font Size: a A A

Research On Optimization Of Finite Difference Algorithm On MIC And GPU Platforms

Posted on:2018-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:X HaoFull Text:PDF
GTID:2348330563451318Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the high performance computer system based on heterogeneous many-core architecture has achieved great success.Compared with the development of hardware technologies,the development of high performance application is limited by heterogeneous programming model and parallel execution efficiency.As an important algorithm in electromagnetics,seismography and fluid mechanics etc,the performance of finite difference algorithm decides the effectiveness of applications.However there are few research works on efficient parallel finite difference algorithm based on many-core platforms by now.This paper analyses the speciality of finite difference algorithm which is used to sove the linear derivative equations,and summarises three problems when optimizing the algorithm on MIC and GPU many-core platforms which are non-uniformed memory access,heterogeneous cooperation and multi-nodes parallel execution.On the MIC platform,we propose the three-step progressive method to optimize 3D Finite Difference(3DFD)algorithm based on Intel MIC.Firstly,the basic optimization methods,such as branch elimination,loop unroll,and invariant extraction,are proposed to reduce calculation strength and remove the obstacle of SIMD(Single Instruction Multiple Data).Secondly,by leveraging the parallel optimization methods such as data dependence analysis,loop tiling,and intrinsic SIMD instructions,it takes full advantage of the mechanism of MIC coprocessor with multithreads and long vector.At last,the heterogeneous cooperative optimization methods,such as data transformation minimization and load balancing,are applied to the platform of CPU+MIC(Many Integrated Cores)which parallelize the algorithm execution in both CPU and MIC.On the GPU platform,we apply the optimization methods from one GPU to multiple GPUs.First of all,we implement the parallel finite difference algorithm with CUDA programming model,and take pipeline model with multiple CUDA stream into effect to maximize the usage rate of GPU resources.In the second,aiming at the non-uniformed memory access pattern,we fully adopt the low-latency and programable on-chip shared memory to fulfill blocking parallelism.In the end,by partitioning program data,we implement the parallel computation of finite difference algorithm on multiple GPU nodes,and achieve linear performance speed-up ratio as the number of GPUs used with optimization methods such as compute overlapped data grid in advance and leverage peer to peer data transformation to eliminate data transfer cost.On the last,we design different experimental schemes to evaluate the optimization effect on finite difference algorithm of above methods on MIC and GPU platform separately with quality assurance by comparing the corresponding serial program.Experimental results show that the optimized 3DFD algorithm gains about 120 speedup compared with serial program at most on both MIC and GPU platforms.Moreover,the parallel optimization gains linear speedup on multiple GPU nodes which shows good parallelism and extensibility.The optimization methods used in this paper,especially for heterogeneous cooperation and multi-nodes parallelism,provide an optimization guide to other applications which run on heterogeneous many-core platforms.
Keywords/Search Tags:Finite Differentce Algorithm, Non-Uniformed Memory Access, MIC Platform, GPU Platform, Heterogeneous Cooperation, Parallel Optimization
PDF Full Text Request
Related items