Research On FPGA-based Large-scale Floating-point Matrix Multiplication Accelerator

Posted on:2016-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:J Z Shen

Full Text:PDF

GTID:2348330536467353

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of technology,the capacity of FPGA(Field Programmable Gate Array)continues to increase,and the characteristics such as good parallelism,reconfigurable and low power consumption of FPGA make it become one of the key components to implement reconfigurable computing.With the deepening of embedded applications as well as expanding the scale,the acceleration requirements of application are increasing,and the unique advantages of FPGA in terms of accelerating applications has made it receive extensive domestic and international attention.Matrix multiplication is the kernel of scientific and engineering computing,and matrix multiplication is one of most fundamental matrix operations,which is often used in such application fields as image processing,electronic communications,materials science simulations and data mining,etc..Large-scale matrix multiplication algorithm often becomes the bottleneck of system performance because of the high complexity and low efficiency.Thus,acceleration of the matrix multiplication algorithm on FPGA is always the focus of study in embedded field.This paper presents a FPGA-based floating-point matrix multiplication accelerator design of high-performance and high storage efficiency,which makes full use of the Xilinx's IP core performance advantage and enhances the utilization efficiency of on-chip computing,storage and bandwidth resources.In addition,aiming at solving the problem of low computational efficiency of the chained matrix multiplication accelerator when accelerating non-uniform matrix multiplication,we proposed a novel optimal blocking technology to further improve the computational efficiency of our accelerator,which is implemented by building a mathematical model to calculate the optimal block size.We implement our design on our MASA-CLUSTER platform,and experimental results show that our design reaches more than 98 % of the computational efficiency,and the measured performance can reach at 19 GFLOPS with 128 processing elements(PEs)at working frequency of 150 MHz.In addition,our design has a good scalability.We apply optimal blocking technology on non-uniform matrix multiplication of convolution neural network,and results show that the blocking technology can enhance 12 % of the computational efficiency of the matrix multiplication accelerator,which provides good support for accelerating applications.

Keywords/Search Tags:

FPGA, matrix multiplication accelerator, optimal blocking technique

PDF Full Text Request

Related items

1	Research Of Accelerating Floating-point Matrix Multiplication Based On FPGA Cluster
2	Design And Implementation Of The Optimal Projection Plane Based Radar Target Recognition Technology Based On FPGA
3	Parallel Algorithms And Architectures For Matrix Computations On FPGA
4	Optimization And Realization For Sparse Matrix-Vector Multiplication On FPGA
5	Design and evaluation of an 'FPGA based' hardware accelerator for elliptic curve cryptography point multiplication
6	Real-time matrix multiplication in FPGA
7	A Study Of The Matrix Operation Harden Implementation On Fpga
8	Performance Evaluation Of Matrix Multiplication Based On FPGA In Fully Connected Dnn Forward Propagation
9	The Research Of Matrix Multiplication Efficiency Based On MPI
10	Research On Key Technology Of Accelerating Floating-Point Matrix Multiplication Based On FPGA In Embedded Environment