Computing SpMV on FPGAs

Posted on:2017-12-12

Degree:Ph.D

Type:Dissertation

University:Iowa State University

Candidate:Townsend, Kevin R

Full Text:PDF

GTID:1448390005478527

Subject:Computer Engineering

Abstract/Summary:

There are hundreds of papers on accelerating sparse matrix vector multiplication (SpMV), however, only a handful target FPGAs. Some claim that FPGAs inherently perform inferiorly to CPUs and GPUs. FPGAs do perform inferiorly for some applications like matrix-matrix multiplication and matrix-vector multiplication. CPUs and GPUs have too much memory bandwidth and too much floating point computation power for FPGAs to compete. However, the low computations to memory operations ratio and irregular memory access of SpMV trips up both CPUs and GPUs. We see this as a leveling of the playing field for FPGAs.;Our implementation focuses on three pillars: matrix traversal, multiply-accumulator design, and matrix compression. First, most SpMV implementations traverse the matrix in row-major order, but we mix column and row traversal. Second, To accommodate the new traversal the multiply accumulator stores many intermediate y values. Third, we compress the matrix to increase the transfer rate of the matrix from RAM to the FPGA. Together these pillars enable our SpMV implementation to perform competitively with CPUs and GPUs.

Keywords/Search Tags:

Spmv, Fpgas, Matrix, Cpus and gpus

Related items

1	Parallel Design And Optimization Of SpMV On ARM Multi-core Platform
2	Sparse Matrix Vector Multiplication Based On CPU And GPU
3	Efficient Sparse Matrix Vector Multiplications On New Many-core Architectures
4	Research On Performance Tuning Of Matrix Multiplication Based On GPU
5	Sparse Matrix Matrix-Vector Multiplication And Auto-Tuning
6	Design And Verification Of DMA For Sparse Matrix Vector Multiplication
7	Design And Implementation Of The Multi-CPUs Control System Based On RS-485 Bus
8	Exploiting Parallelism in GPUs
9	Research On Optimization Of Main Memory Database Query Execution On Multi-core CPUs
10	Exploiting Data-Parallelism in GPUs