Font Size: a A A

Research On Performance Tuning Of Matrix Multiplication Based On GPU

Posted on:2016-09-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J YinFull Text:PDF
GTID:1108330461485431Subject:Computer software and theory
Abstract/Summary:Request the full-text of this thesis
Due to cooling power consumption and other factors, performance of traditional single-core processor has failed to keep up with the speed of development of hardware resources. Yet, recent years blossoming of innovative applications in the field of high-performance computing requires higher performance of computers. Compare to traditional single-core processors, multi-core/many-core processors can take advantage of thread-level parallelism to improve its performance to satisfy higher level demands, which has been widely accepted by the academic world and industrial circle. Despite of higher FLOPS and better computational capabilities, multi-core/many-core processors also have complex structures and programming environment, which make excavating the powerful computing power of the multi-core/many-core processors a prominent problem. To address this problem, exploring the core algorithm in numerous applications and optimizing multi-core/many-core processors according to its features become particularly important.This paper respectively use Dense matrix-vector multiplication (GEMV) and Sparse matrix-vector multiplication (SpMV) as representative of regular kernels and irregular kernels to carry out the research.1) With use of cache blocking method to design better Dense matrix-vector multiplication based on many-core GPU, the first algorithm can get a better use of GPU from the perspective of increasing threads parallelism. We can replace a single thread in traditional algorithm with threads in the same warp to compute a corresponding factors in vector y.In the second algorithm we mix the spirit of data reusing in registers on the basis of the first algorithm. Register resource is the fastest in multi-core/many-core GPU platform, which is also the effective way to resolve the bottleneck problem of memory access. Through experimental analysis, the new algorithms have better performance than traditional library function, especially for small and rows-more-than-columns matrices, which have a 10% improvement. Besides, this article does further research on the effect of times of register reusing towards features of many-core GPU’s features.2) To make a better use of man-core processers we optimize Sparse matrix-vector multiplication by new storage format on the basis of HYB storage format and do more dividing to make its COO’s part smaller to add to ELL’s part. In our experiment we made considerate studies on parameter dividing, we found that algorithm performance improved a lot after optimizing. Compared with algorithm based on HYB format, which based on HYB-I format has a better performance in both kinds of platform, there is a 17% improvement in an ideal situation.3) Proposing new cache blocking method based on multi-core/many-core GPU for SpMV algorithm divides SpMV algorithm and store it in CSR format to improve the efficiency of memory access, which make performance of Sparse matrix-vector multiplication improved. After experiments we found that the speed of cache block method is 5 times faster than counterpart in CSR format.
Keywords/Search Tags:GPU, SpMV, GEMV algorithm optimization
Request the full-text of this thesis
Related items