Font Size: a A A

The Design And Performance Optimization Of Vector Floating-Point Arithmetic Accelerator Based On IEEE-754 Standard

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2518306050470124Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,following theoretical science and experimental science,high-performance computing has become the third paradigm of human science research.High-performance computing is composed of scientific computing and high-performance embedded computing.Floating-point matrix operations have also received much attention as the basic operations in high-performance embedded systems.The design and implementation of hardware accelerators for floating-point matrix operations are contemporary research hotspots and difficulties in the field.One of the effective solutions to accelerate large-scale matrix operations is to customize dedicated ASIC chips and perform fine-grained operation splits for operations,memory accesses,and data-intensive matrix operations to improve resource utilization.Based on an in-depth study of the 32-bit RISC general-purpose main processor and general-purpose floating-point accelerator,this paper designs and implements a vector floating-point arithmetic accelerator based on the RISC+SIMD architecture,which can realize single/double-precision floating that meets the IEEE-754 standard.Floating-point matrix addition,subtraction,multiplication,negative-multiplication,multiply-add operations,multiply-subtract-operation,negative-multiply-add-operation,and negative multiply-substract-operation.The main research work includes the following aspects:1.Design and implement a Vector Floating-point Arithmetic Accelerator based on RISC+SIMD architecture.It has designed 16 FMAC concurrent units and 48-registers(64-bit).The FMAC structure is arranged with reference to the two-dimensional matrix Systolic-array structure and optimized structure.On the basis of retaining the characteristics of low power consumption and fast response of the original system,which solves the bottlenecks of a single General Floating-point Arithmetic Accelerator in fetching data and executing parallelism.The Vector Floating-point Arithmetic Accelerator effectively solves the "computing power" problem faced by data-intensive operations.2.Specially opened up the high-bit-width and fast access channels of the Vector Floating-point Arithmetic Accelerator and the on-chip SRAM of the main processor,and the data is directly taken from the on-chip SRAM;at the same time,the Vector Floating-point Arithmetic Accelerator also designed the AHB master interface to access the storage on the bus.For data,only a small number of control instructions go through the coprocessor channel,which effectively solves the "data throughput efficiency" problem faced by data-intensive operations.3.According to the hardware structure of the Vector Floating-point Arithmetic Accelerator,and drawing on the ideas of the Goto-BLAS function library,a general-purpose GEMM assembly function library is designed and optimized,which can realize the operation division of matrices of arbitrary dimensions.The test results show that the performance of the Vector Floating-point Arithmetic Accelerator using the assembly function library to implement floating-point matrix operations is 10 to 44 times that of the C language library;the General Floating-point Arithmetic Accelerator uses the assembly function library to implement floating-point matrix operations is 2 to 10 times that of the C language function library performance.4.This paper conducts detailed performance test comparison analysis of single/double precision floating point matrix addition/subtraction operation,transposition operation and multiplication operation.The results show that the performance of the Vector Floating-point Arithmetic Accelerator is 3.13.5 times,2.52.9 times,and 6.17.6 times of the performance of the General Floating-point Arithmetic Accelerator,respectively.The Vector Floating-point Arithmetic Accelerator achieving good floating-point matrix operation acceleration.5.Finally,the Vector Floating-point Arithmetic Accelerator designed in this paper is synthesized using the SMIC40nm CMOS process library.Its integrated area is 1.1100863088mm2,the operating frequency reaches 600 MHz,and the total power consumption is 719.3m W.Its floating-point matrix operation performance can reach more than 2000 MFLOPS.
Keywords/Search Tags:Vector Floating-point Arithmetic Accelerator, IEEE-754, Systolic Array, GEMM Function Library
PDF Full Text Request
Related items