The Design And Performance Optimization Of Vector Floating-Point Arithmetic Accelerator Based On IEEE-754 Standard

Posted on:2021-04-05

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2518306050470124

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,following theoretical science and experimental science,high-performance computing has become the third paradigm of human science research.High-performance computing is composed of scientific computing and high-performance embedded computing.Floating-point matrix operations have also received much attention as the basic operations in high-performance embedded systems.The design and implementation of hardware accelerators for floating-point matrix operations are contemporary research hotspots and difficulties in the field.One of the effective solutions to accelerate large-scale matrix operations is to customize dedicated ASIC chips and perform fine-grained operation splits for operations,memory accesses,and data-intensive matrix operations to improve resource utilization.Based on an in-depth study of the 32-bit RISC general-purpose main processor and general-purpose floating-point accelerator,this paper designs and implements a vector floating-point arithmetic accelerator based on the RISC+SIMD architecture,which can realize single/double-precision floating that meets the IEEE-754 standard.Floating-point matrix addition,subtraction,multiplication,negative-multiplication,multiply-add operations,multiply-subtract-operation,negative-multiply-add-operation,and negative multiply-substract-operation.The main research work includes the following aspects:1.Design and implement a Vector Floating-point Arithmetic Accelerator based on RISC+SIMD architecture.It has designed 16 FMAC concurrent units and 48-registers(64-bit).The FMAC structure is arranged with reference to the two-dimensional matrix Systolic-array structure and optimized structure.On the basis of retaining the characteristics of low power consumption and fast response of the original system,which solves the bottlenecks of a single General Floating-point Arithmetic Accelerator in fetching data and executing parallelism.The Vector Floating-point Arithmetic Accelerator effectively solves the "computing power" problem faced by data-intensive operations.2.Specially opened up the high-bit-width and fast access channels of the Vector Floating-point Arithmetic Accelerator and the on-chip SRAM of the main processor,and the data is directly taken from the on-chip SRAM;at the same time,the Vector Floating-point Arithmetic Accelerator also designed the AHB master interface to access the storage on the bus.For data,only a small number of control instructions go through the coprocessor channel,which effectively solves the "data throughput efficiency" problem faced by data-intensive operations.3.According to the hardware structure of the Vector Floating-point Arithmetic Accelerator,and drawing on the ideas of the Goto-BLAS function library,a general-purpose GEMM assembly function library is designed and optimized,which can realize the operation division of matrices of arbitrary dimensions.The test results show that the performance of the Vector Floating-point Arithmetic Accelerator using the assembly function library to implement floating-point matrix operations is 10 to 44 times that of the C language library;the General Floating-point Arithmetic Accelerator uses the assembly function library to implement floating-point matrix operations is 2 to 10 times that of the C language function library performance.4.This paper conducts detailed performance test comparison analysis of single/double precision floating point matrix addition/subtraction operation,transposition operation and multiplication operation.The results show that the performance of the Vector Floating-point Arithmetic Accelerator is 3.1³.5 times,2.5².9 times,and 6.1⁷.6 times of the performance of the General Floating-point Arithmetic Accelerator,respectively.The Vector Floating-point Arithmetic Accelerator achieving good floating-point matrix operation acceleration.5.Finally,the Vector Floating-point Arithmetic Accelerator designed in this paper is synthesized using the SMIC₄0nm CMOS process library.Its integrated area is 1.1100863088mm²,the operating frequency reaches 600 MHz,and the total power consumption is 719.3m W.Its floating-point matrix operation performance can reach more than 2000 MFLOPS.

Keywords/Search Tags:

Vector Floating-point Arithmetic Accelerator, IEEE-754, Systolic Array, GEMM Function Library

PDF Full Text Request

Related items

1	A rigorous framework for fully supporting the IEEE standard for floating-point arithmetic in high-level programming languages
2	Key Technologies Of VLSI Implementation Of High Performance Floating-point Arithmetic Unit
3	Hardware Design And Implementation Of Floating-point Instruction Based On AltiVec
4	The Design And Implementation Of Floating Point Unit Based On ARMv7 Floating Point Instruction Set
5	A hierarchical verification of the IEEE-754 table-driven floating-point exponential function using HOL
6	The Key Technology Study For Super Precision Floating-Point Arithmetic
7	High-performance Floating-Point Unit Design
8	The Architecture And Implementation Of Arithmetic Clusters Based On Stream Applications
9	Implementation Of RISC-V Floating Point Instructions Based On ShenWei Architecture
10	Applications Of Arbitrary Precision Floating-Point Arithmetic In Delaunay Mesh Generation