Efficient Sparse Matrix Vector Multiplications On New Many-core Architectures

Posted on:2019-01-15

Degree:Master

Type:Thesis

Country:China

Candidate:S Z Chen

Full Text:PDF

GTID:2428330611493473

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Sparse matrix-vector multiplication(SpMV)is one of the common algorithms in high performance computing applications.Because the non-zero elements in the sparse matrix are irregularly arranged,the efficient implementation of the SpMV algorithm is very difficult,and it is usually optimized for different high-performance parallel computing platforms.The new multi-core architecture has stronger processing power and higher memory bandwidth,which is an important trend in the development of high-performance processors.The efficient design of SpMV for new multi-core architecture is of great significance for high-performance computing applications.We first systematically evaluate the performance of the SpMV on two new multicore platforms,Intel Knights Landing(KNL)and ARM v8-based FT-2000Plus(FTP).We deeply analyzes the architectural features,sparse matrix storage format and matrix.The impact of the data set on the performance of the algorithm.Because matrix storage format selection relies on expert experience,it does not have the universality of architecture and data set.In this paper,a sparse matrix format selection model is built based on machine learning method,and adaptive format selection for different architectures and data sets is realized.On this basis,a hybrid storage format for the new multi-core architecture is proposed,which aims to capture the advantages of the native storage format.main tasks as follows: The main work of us is as follows:(1)We thoroughly evaluated the performance of the sparse matrix storage format on KNL and FTP many-core architecture processors for the first time.The experiment involves 956 sparse matrix datasets and five mainstream sparse storage formats.The effects of NUMA binding,vectorization and sparse matrix structure on the performance of SpMV are studied.The SpMV algorithm is compared in two experiments.Performance on the platform.The results show that the efficient sparse matrix storage format is closely related to the processor architecture and input matrix structure characteristics.(2)To help developers to choose the optimal matrix representation,we employ machine learning to develop a predictive model.Our model is first trained offline using a set of training examples.The learned model can be used to predict the best matrix representation for any unseen input for a given architecture.We show that our model delivers on average 95% and 91% of the best available performance on KNL and FTP respectively,and it achieves this with no runtime profiling overhead.(3)We proposes a hybrid sparse matrix storage format HYB5 based on SELL-C-σand CSR5.We segment the calculated matrix and design the SpMV algorithm.HYB5 outperforms existing sparse storage formats such as SELL-C-σ and CSR5.The experimental results on the KNL platform show that the performance of HYB5 is better than the native formats SELL-C-σ and CSR5,and the acceleration ratio is 58 times and 1.62 times respectively.

Keywords/Search Tags:

SpMV, Sparse Matrix, Many-Cores Architectures, Performence analysis, Performance optimization

PDF Full Text Request

Related items

1	Parallel Design And Optimization Of SpMV On ARM Multi-core Platform
2	GPU-based SpMv Parallel Acceleration And Performance Optimization
3	Sparse Matrix Vector Multiplication Based On CPU And GPU
4	Research On Performance Tuning Of Matrix Multiplication Based On GPU
5	Design And Verification Of DMA For Sparse Matrix Vector Multiplication
6	The Research And Implementation Of Global Shared Memory For Sparse Matrix-Vector Multiplication
7	Parallel Algorithms And Architectures For Matrix Computations On FPGA
8	Computing SpMV on FPGAs
9	Sparse Matrix Matrix-Vector Multiplication And Auto-Tuning
10	Performance Optimization Of Sparse Matrix Multiplication For High-Performance Computing Platforms