Research On Optimization Of Finite Difference Algorithm On MIC And GPU Platforms

Posted on:2018-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Hao

Full Text:PDF

GTID:2348330563451318

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,the high performance computer system based on heterogeneous many-core architecture has achieved great success.Compared with the development of hardware technologies,the development of high performance application is limited by heterogeneous programming model and parallel execution efficiency.As an important algorithm in electromagnetics,seismography and fluid mechanics etc,the performance of finite difference algorithm decides the effectiveness of applications.However there are few research works on efficient parallel finite difference algorithm based on many-core platforms by now.This paper analyses the speciality of finite difference algorithm which is used to sove the linear derivative equations,and summarises three problems when optimizing the algorithm on MIC and GPU many-core platforms which are non-uniformed memory access,heterogeneous cooperation and multi-nodes parallel execution.On the MIC platform,we propose the three-step progressive method to optimize 3D Finite Difference(3DFD)algorithm based on Intel MIC.Firstly,the basic optimization methods,such as branch elimination,loop unroll,and invariant extraction,are proposed to reduce calculation strength and remove the obstacle of SIMD(Single Instruction Multiple Data).Secondly,by leveraging the parallel optimization methods such as data dependence analysis,loop tiling,and intrinsic SIMD instructions,it takes full advantage of the mechanism of MIC coprocessor with multithreads and long vector.At last,the heterogeneous cooperative optimization methods,such as data transformation minimization and load balancing,are applied to the platform of CPU+MIC(Many Integrated Cores)which parallelize the algorithm execution in both CPU and MIC.On the GPU platform,we apply the optimization methods from one GPU to multiple GPUs.First of all,we implement the parallel finite difference algorithm with CUDA programming model,and take pipeline model with multiple CUDA stream into effect to maximize the usage rate of GPU resources.In the second,aiming at the non-uniformed memory access pattern,we fully adopt the low-latency and programable on-chip shared memory to fulfill blocking parallelism.In the end,by partitioning program data,we implement the parallel computation of finite difference algorithm on multiple GPU nodes,and achieve linear performance speed-up ratio as the number of GPUs used with optimization methods such as compute overlapped data grid in advance and leverage peer to peer data transformation to eliminate data transfer cost.On the last,we design different experimental schemes to evaluate the optimization effect on finite difference algorithm of above methods on MIC and GPU platform separately with quality assurance by comparing the corresponding serial program.Experimental results show that the optimized 3DFD algorithm gains about 120 speedup compared with serial program at most on both MIC and GPU platforms.Moreover,the parallel optimization gains linear speedup on multiple GPU nodes which shows good parallelism and extensibility.The optimization methods used in this paper,especially for heterogeneous cooperation and multi-nodes parallelism,provide an optimization guide to other applications which run on heterogeneous many-core platforms.

Keywords/Search Tags:

Finite Differentce Algorithm, Non-Uniformed Memory Access, MIC Platform, GPU Platform, Heterogeneous Cooperation, Parallel Optimization

PDF Full Text Request

Related items

1	A Parallel Optimization Algorithm Of Deep Learning On Heterogeneous Platform
2	Research On Heterogeneous Many-core Platform Tianhe ? For CFD Simulation And Performance Optimization Technology
3	Research On Optimization Technology Of Compiler And Memory Access For Domestic Sunway Platform
4	Implementation And Optimization Of Threshold Segmentation Algorithm For Feiteng Platform
5	Research On Performance Optimization Of Heterogeneous Platform Based On CPU-GPU And Multicore Parallel Programming Model
6	Key Techniques Research On Unified Programming Environment For Heterogeneous Parallel Systems
7	Analysis And Visualization Of Memory Access Characteristics In Heterogeneous Memory
8	Research On Cost Model And Memory Optimization Techniques For Heterogeneous Processor
9	Optimization Methods For Irregular Tasks On CPU-GPU Heterogeneous Platform
10	Design And Implementation Of Microwave Thermoacoustic Imaging Reconstruction Based On Heterogeneous Platform