Font Size: a A A

Research On FPGA-based Large-scale Floating-point Matrix Multiplication Accelerator

Posted on:2016-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:J Z ShenFull Text:PDF
GTID:2348330536467353Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of technology,the capacity of FPGA(Field Programmable Gate Array)continues to increase,and the characteristics such as good parallelism,reconfigurable and low power consumption of FPGA make it become one of the key components to implement reconfigurable computing.With the deepening of embedded applications as well as expanding the scale,the acceleration requirements of application are increasing,and the unique advantages of FPGA in terms of accelerating applications has made it receive extensive domestic and international attention.Matrix multiplication is the kernel of scientific and engineering computing,and matrix multiplication is one of most fundamental matrix operations,which is often used in such application fields as image processing,electronic communications,materials science simulations and data mining,etc..Large-scale matrix multiplication algorithm often becomes the bottleneck of system performance because of the high complexity and low efficiency.Thus,acceleration of the matrix multiplication algorithm on FPGA is always the focus of study in embedded field.This paper presents a FPGA-based floating-point matrix multiplication accelerator design of high-performance and high storage efficiency,which makes full use of the Xilinx's IP core performance advantage and enhances the utilization efficiency of on-chip computing,storage and bandwidth resources.In addition,aiming at solving the problem of low computational efficiency of the chained matrix multiplication accelerator when accelerating non-uniform matrix multiplication,we proposed a novel optimal blocking technology to further improve the computational efficiency of our accelerator,which is implemented by building a mathematical model to calculate the optimal block size.We implement our design on our MASA-CLUSTER platform,and experimental results show that our design reaches more than 98 % of the computational efficiency,and the measured performance can reach at 19 GFLOPS with 128 processing elements(PEs)at working frequency of 150 MHz.In addition,our design has a good scalability.We apply optimal blocking technology on non-uniform matrix multiplication of convolution neural network,and results show that the blocking technology can enhance 12 % of the computational efficiency of the matrix multiplication accelerator,which provides good support for accelerating applications.
Keywords/Search Tags:FPGA, matrix multiplication accelerator, optimal blocking technique
PDF Full Text Request
Related items