Font Size: a A A

Efficient Sparse Matrix Vector Multiplications On New Many-core Architectures

Posted on:2019-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:S Z ChenFull Text:PDF
GTID:2428330611493473Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sparse matrix-vector multiplication(SpMV)is one of the common algorithms in high performance computing applications.Because the non-zero elements in the sparse matrix are irregularly arranged,the efficient implementation of the SpMV algorithm is very difficult,and it is usually optimized for different high-performance parallel computing platforms.The new multi-core architecture has stronger processing power and higher memory bandwidth,which is an important trend in the development of high-performance processors.The efficient design of SpMV for new multi-core architecture is of great significance for high-performance computing applications.We first systematically evaluate the performance of the SpMV on two new multicore platforms,Intel Knights Landing(KNL)and ARM v8-based FT-2000Plus(FTP).We deeply analyzes the architectural features,sparse matrix storage format and matrix.The impact of the data set on the performance of the algorithm.Because matrix storage format selection relies on expert experience,it does not have the universality of architecture and data set.In this paper,a sparse matrix format selection model is built based on machine learning method,and adaptive format selection for different architectures and data sets is realized.On this basis,a hybrid storage format for the new multi-core architecture is proposed,which aims to capture the advantages of the native storage format.main tasks as follows: The main work of us is as follows:(1)We thoroughly evaluated the performance of the sparse matrix storage format on KNL and FTP many-core architecture processors for the first time.The experiment involves 956 sparse matrix datasets and five mainstream sparse storage formats.The effects of NUMA binding,vectorization and sparse matrix structure on the performance of SpMV are studied.The SpMV algorithm is compared in two experiments.Performance on the platform.The results show that the efficient sparse matrix storage format is closely related to the processor architecture and input matrix structure characteristics.(2)To help developers to choose the optimal matrix representation,we employ machine learning to develop a predictive model.Our model is first trained offline using a set of training examples.The learned model can be used to predict the best matrix representation for any unseen input for a given architecture.We show that our model delivers on average 95% and 91% of the best available performance on KNL and FTP respectively,and it achieves this with no runtime profiling overhead.(3)We proposes a hybrid sparse matrix storage format HYB5 based on SELL-C-?and CSR5.We segment the calculated matrix and design the SpMV algorithm.HYB5 outperforms existing sparse storage formats such as SELL-C-? and CSR5.The experimental results on the KNL platform show that the performance of HYB5 is better than the native formats SELL-C-? and CSR5,and the acceleration ratio is 58 times and 1.62 times respectively.
Keywords/Search Tags:SpMV, Sparse Matrix, Many-Cores Architectures, Performence analysis, Performance optimization
PDF Full Text Request
Related items