Font Size: a A A

Research On Key Issues Of Performance Optimization In High Performance Computing Based On The Godson

Posted on:2018-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:1318330512967469Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
High performance computing(HPC)is widely used in the fields such as scien-tific computing,visualization and big data processing.The performance of software is closely related with the platform architecture in HPC.Godson is a representative of the domestic chip with independent intellectual property rights,and its 3B series are used in fields such as aerospace,biological information and meteorology.However,due to its unsound software ecosystem,softwares transplanted from other platforms are not optimized.The transplanted softwares without optimization perform badly in these architectures.Due to the bad general memory access and low coupling of software to ar-chitecture of hardware,some scientific softwares are not efficient and the performance requirement of applications cannot be satisfied in the Godson.The bad performance restricts the market-oriented promotion of Godson.It becomes very important how to improve the software performance in the Godson.Therefore,this thesis describes our work on key issues of performance optimiza-tion in high performance computing based on the Godson.The following are our major research and contributions:(1)To deal with the low coupling of mathematical libraries to architecture of hard-ware,an asynchronous computing and memory access optimization method is proposed to improve the performance of compute-intensive applications by studying the kernel function of BLAS,GEMM.In the optimization method,the task is divided into groups,and the subtasks are arranged in a pipeline by combining the feature of separated com-puting and memory access.Meanwhile,this method conceals the memory access over-head by using multiple-channels memory access and accelerates the kernel computing by using SIMD instructions.Moreover,a performance evaluation method is proposed by analyzing the connection of computing and memory access.(2)In order to solve the problem that normal memory access methods cannot satisfy the demand for memory access in compute-intensive applications,an multiple-level data partition optimization method is proposed to accelerate the applications by studying the FFTW library.Using this method,the kernel algorithm,Radix-2 FFT,is researched and optimized.The optimized FFT performs much better than the original one.In the optimized algorithm,the locked L3 cache is used as the intermediate storage.The FFT algorithm is reconstructed,and multiple-level data partition method is used to enhance the reuse ratio of data in the cache.The reuse rate of data in vector registers by increasing the number of iterations.Meanwhile,SIMD instructions such as butterfly transform and data shuffler are used to accelerate the kernel operations.(3)Due to the unreasonable data deployment,some compute-intensive applica-tions get low parallel efficiency in the KD-90 platform.In order to solve this problem,a parallel framework HPFCA is proposed to implement the compute-intensive applica-tions based on the PCAM parallel framework.Issues such as task partition,parallelism between tasks,data redeployment,parallelism in node and optimization to single-core application are researched.Meanwhile,the memory access is optimized by using the data locality.Moreover,efficient parallel algorithms of GEMM and FFT are imple-mented on the KD-90 based on the parallel framework by combining the HPFCA.(4)In order to solve the problem that application can hardly execute concurrently in the multi-core heterogeneous platform,an efficient load balancing algorithm is pro-posed by researching the solution for high-precision 3D Poisson equation.First of all,the multi-grid method is used to discrete the 3D Poisson Equation,and the problem is transformed to multiple linear equations solutions.Then,the loading ratio is determined by computing the computational power for different resources,and the tasks are divided and distributed to different computing resources.Using the load balancing method,the concurrent performance can be improved greatly for the multi-core heterogeneous plat-form.
Keywords/Search Tags:High performance computing, Performance Optimization for Applica-tions, Godson-3B1500, FFTW, BLAS, CC-NUMA, Heterogeneous Computing
PDF Full Text Request
Related items