Font Size: a A A

Research Of Accelerating Floating-point Matrix Multiplication Based On FPGA Cluster

Posted on:2018-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:J W DuFull Text:PDF
GTID:2428330569998653Subject:Computer Architecture
Abstract/Summary:PDF Full Text Request
With the development of big data and cloud computing,the calculation intensity of HPC is becoming higher and higher.HPC is required to provide not only high ability in calculation but also low power consumption.So some new software programming architecture with high flexibility and fault tolerance is deployed on the server cluster,it has good scalability and high computational efficiency.Moreover,hardware accelerators are also deployed in the compute node to further enhance the performance of the data analysis and processing.Based on previous research,this thesis proposes a hardware acceleration platform based on FPGA to deal with large-scale matrix operations.This architecture has several Xilinx Virtex-6 FPGA evaluation boards(EVBs)and computing servers.They are connected through PCIE interface and the EVBs communicate through 14 GB/s Fiber interface.We design the same offload engine module in each EVBs to deal with data message pretreatment respectively.The correctness of the proposed acceleration platform is verified by simulation.In addition,different dimensions of matrix-vector multiplication has been carried on the experiment,the results also confirmed that the design can achieve high computing performance.Based on the analysis of floating point matrix multiplication algorithm and FPGA cluster architecture,we designed and implemented hardware module of the parallel floating-point matrix calculation using Verilog hardware programmings languages.The design reduces the computational complexity and resource utilization and improves the computation efficiency.This module ranks two floating-point matrix parameters by arbitrary configuration,and can flexibly set the number of calculating units according to the resources of the logic chip.Moreover,there are no interactions between adjacent PE units,Tt has good portability and scalability.In this thesis,we design a high speed interface to realize the communication between the PCIE co-processor and the compute node CPU.The programmable FPGA system is combined with the driver in the compute node to realize the cooperative work of the software and hardware system.This thesis analyzes and verifies the performance of designed OE module and its floating point matrix multiplier module by simulation and synthesis.Comparison of Intel I5-4690 CPU and the same processing unit under the single card FPGA computing performance,showed that the acceleration platform designed in this thesis can obtain better performance,and the design has good parallel efficiency and computational efficiency.In addition,in this thesis,we compare the performance of the floating-point matrix multiplication with different dimensions.The design achieves a high computational performance,which is nearly 2 times faster than single FPGA board.
Keywords/Search Tags:HPC, Accelerator cluster, Floating-Point Matrix Multiplication, Calculated Performance
PDF Full Text Request
Related items