Research Of Accelerating Floating-point Matrix Multiplication Based On FPGA Cluster

Posted on:2018-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:J W Du

Full Text:PDF

GTID:2428330569998653

Subject:Computer Architecture

Abstract/Summary:

PDF Full Text Request

With the development of big data and cloud computing,the calculation intensity of HPC is becoming higher and higher.HPC is required to provide not only high ability in calculation but also low power consumption.So some new software programming architecture with high flexibility and fault tolerance is deployed on the server cluster,it has good scalability and high computational efficiency.Moreover,hardware accelerators are also deployed in the compute node to further enhance the performance of the data analysis and processing.Based on previous research,this thesis proposes a hardware acceleration platform based on FPGA to deal with large-scale matrix operations.This architecture has several Xilinx Virtex-6 FPGA evaluation boards(EVBs)and computing servers.They are connected through PCIE interface and the EVBs communicate through 14 GB/s Fiber interface.We design the same offload engine module in each EVBs to deal with data message pretreatment respectively.The correctness of the proposed acceleration platform is verified by simulation.In addition,different dimensions of matrix-vector multiplication has been carried on the experiment,the results also confirmed that the design can achieve high computing performance.Based on the analysis of floating point matrix multiplication algorithm and FPGA cluster architecture,we designed and implemented hardware module of the parallel floating-point matrix calculation using Verilog hardware programmings languages.The design reduces the computational complexity and resource utilization and improves the computation efficiency.This module ranks two floating-point matrix parameters by arbitrary configuration,and can flexibly set the number of calculating units according to the resources of the logic chip.Moreover,there are no interactions between adjacent PE units,Tt has good portability and scalability.In this thesis,we design a high speed interface to realize the communication between the PCIE co-processor and the compute node CPU.The programmable FPGA system is combined with the driver in the compute node to realize the cooperative work of the software and hardware system.This thesis analyzes and verifies the performance of designed OE module and its floating point matrix multiplier module by simulation and synthesis.Comparison of Intel I5-4690 CPU and the same processing unit under the single card FPGA computing performance,showed that the acceleration platform designed in this thesis can obtain better performance,and the design has good parallel efficiency and computational efficiency.In addition,in this thesis,we compare the performance of the floating-point matrix multiplication with different dimensions.The design achieves a high computational performance,which is nearly 2 times faster than single FPGA board.

Keywords/Search Tags:

HPC, Accelerator cluster, Floating-Point Matrix Multiplication, Calculated Performance

PDF Full Text Request

Related items

1	Research On FPGA-based Large-scale Floating-point Matrix Multiplication Accelerator
2	A novel algorithm for fixed-point and floating-point matrix multiplication on a FPGA
3	Research On Key Technology Of Accelerating Floating-Point Matrix Multiplication Based On FPGA In Embedded Environment
4	Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic
5	The Research And Implementation Of High Performance SIMD Floating-point Multiplication Accumulator Unit For FT-XDSP
6	The Design And Performance Optimization Of Vector Floating-Point Arithmetic Accelerator Based On IEEE-754 Standard
7	Research On Floating Point Multiply Add Unit Of High Performance Microprocessor
8	Analysis And Design Of High-performance Floating-Point Unit
9	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
10	The Study Of Error-Controllable Double-Precision Floating-Point Arithmetic Accelerator Based On FPGA