The Research And Implementation Of Hardware Accelerator About UNET With Fast Convolution Algorithm

Posted on:2022-10-14

Degree:Master

Type:Thesis

Country:China

Candidate:W B Li

Full Text:PDF

GTID:2518306725490594

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

In recent years,while the convolutional neural network shows outstanding effects in various computing tasks,it also develops rapidly in the direction of deepening the network depth,reducing the size of the convolutional kernel and changing the network structure to solve the gradient problem.As the derivative network of Fully Convolutional Network,UNET has remarkable effect in the field of image segmentation and image denoising.Meanwhile,its deep network structure has huge demands on computing power and storage in the process of training and inference.In the training process of UNET network,the high-performance computing cluster can be relied on to conquer the limitations.However,in the process of inference,the high-performance computing cluster cannot meet the limitations of computing precision,speed and space.Using FPGA to accelerate the inference of UNET can not only satisfy the needs of complex applicated environment,but also make full use of the calculation characteristics of high-speed,high energy efficiency ratio and high flexibility of FPGA.This paper first introduces the components of the UNET network and makes some modifications to the original UNET according to the characteristics of hardware computing.UNET is trained by using the open resource Tensor Flow framework,and the parameters obtained are quantified as fixed points by shifting process to be deployed by hardware accelerator.Secondly,considering that the FPGA platform can provide limited DSPs for multiplication calculation,the fast convolution algorithm which can reduce the number of multiplication calculation is introduced into this work.According to the characteristics of Winograd fast convolution algorithm,a fast hardware scheme for conversion and reverse conversion process is applied,and different kind of data's conversion methods are discussed.Through the research and optimization of Winograd's loop process,the direction of increasing computing parallelism is determined.At the same time,the performance of reading and writing data is improved by loop interchange.In view of the on-chip and off-chip data interaction mode,the DRAM access volume under different data reuse modes is studied,and the data reuse mode to minimize DRAM access is determined for the UNET network using fast convolution algorithm.Finally,based on the above work,this paper proposes a hardware accelerator of UNET network using fast convolution algorithm.The parallelism of the accelerator can be configured according to the resource situation,while the internal contained computing units can be reused in different layers with different work tasks.Dedicated storage blocks and pipeline-based design improve the computational efficiency of acceleration,which is ultimately reflected in the reduction of inference time and power consumption of accelerator.The accelerator framework proposed in this paper is implemented by using hardware description language and deployed in the ZC706 evaluation suite.And the inference acceleration experiment of UNET network is carried out.The experimental results show that the inference time of accelerator by using the fast convolution algorithm is about 3.66 s.The power consumption is 22.936 W.And the average computing power can reach 55.6269 GMAC/s(an equivalent value).In comparison,the inference time of CPU platform is 3.15 times that of FPGA accelerator for the same load,and the energy efficiency ratio of GPU platform is only half that of FPGA accelerator.This proves that the FPGA accelerator has achieved the purpose of accelerating computation.At the same time,the mathematical model of the accelerator resource is made,and the direction of the design can be improved is analyzed.

Keywords/Search Tags:

UNET, fast convolution algorithm, hardware acceleration, FPGA

PDF Full Text Request

Related items

1	Research On CNN Network Acceleration For Image Classification Based On FPGA
2	Research On First-order Moments-based Fast Algorithms For Computations With Convolution Form And Their Hardware Structures
3	Research And Implementation Of Hardware-efficient Parallel Structures For FIR Digital Filter Based On Iterated Short Convolution Algorithm
4	Study On Method And Implementation Of FPGA Based Acceleration For Convolution Neural Network
5	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design
6	Hardware Acceleration And Implementation Of GoogLeNet Network Based On Sparse Convolution
7	The Research On Power-Grid Fast Modeling And Hardware Acceleration For EDA Tools
8	Research And Implementation Of Object Detection Acceleration Method Based On FPGA
9	Research Of Hardware Acceleration Technique For Critical Algorithms Of DSP Applications
10	Design Of Hardware Acceleration IP Core For RC4 Encryption Algorithm