Font Size: a A A

Design And Optimization Of Parallel Algorithm Based On MIC Many-core Architecture

Posted on:2018-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhouFull Text:PDF
GTID:2348330536988244Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,more and more people are concerned on the many-core processor(Many Ingerated Cores,MIC).Many cores architecture has become the first choice for many supercomputers.At present,the most extensive application of science and engineering is the finite difference calculation,sparse matrix multiplication,iterative algorithms of linear equations,which are mostly memory-intensive computing,and the new generation MIC processor Knight Landing(KNL)Access-intensive algorithm has a good effect.In this thesis,we will design and optimize the above-mentioned parallel algorithms on the multi-core architecture,and study from the following aspects:First,the sparse matrix vector multiplication algorithm design and optimization.The sparse matrix vector multiplication algorithm(SpMV)is a commonly used algorithm for solving large-scale linear equations.The key to optimizing SpMV is to compress sparse matrices.On the one hand,the MIC has more vector bits(512 bits SIMD)than the CPU,and the use of conventional CPU compression on the MIC can lead to low SIMD utilization.On the other hand,there are architectural differences between the GPU and the MIC,and if the compression method on the GPU can solve the problem of low SIMD utilization,it will cause new problems of core load imbalance.In this thesis,a new compression algorithm is proposed to solve the above problems.At the same time,the performance of the proposed algorithm is improved by about 30% using AVX512 instruction set mask optimization.Second,the design and optimization of the template algorithm,mainly from two aspects.Template algorithm(Stencil)is a common algorithm for solving finite difference,both memory-intensive and cache delay characteristics.In this thesis,a forward and backward update algorithm is proposed to improve the buffer utilization rate of Stencil computation for Stencil's problem of low first-level cache utilization and time locality.In addition,according to the characteristics of Stencil algorithm,we study the effect of different MCDRAM and cluster mode on computing performance.We design a method of MCDRAM and DDR concurrency to solve MCDRAM Delay problem.Third,BP neural network design and optimization.BP neural network is a kind of artificial neural network which adopts Back Propagation(BP)algorithm,which has high requirements on the floating point computing ability of the processor.In this thesis,the BP neural network is vector-scaled and optimized by using register block and buffer block method.The experimental results show that the processing speed can reach 220 img / s in KNL,the speedup ratio is 13.2,which is 2.9 times of that of GPU and 2.28 times of KNC.This thesis analyzes the sparse matrix vector multiplication,finite difference template calculation and the BP neural network algorithm,and designs the algorithm based on the multi-core architecture.The algorithm is based on the multi-core architecture.Research,and achieved good acceleration effect.
Keywords/Search Tags:many-core architecture, sparse matrix vector multiplication, finite difference, BP neural network, memory intensive
PDF Full Text Request
Related items