Font Size: a A A

A Convolutional Neural Network Accelerator Based On FPGA

Posted on:2022-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2518306524980239Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image classification and recognition has become an important part of artificial intelligence,and it is a hot topic of people's research.Convolutional neural network is a key technology in image classification and recognition,and it is widely used on GPU,CPU and other different platforms.In order to meet different deployment requirements,especially in embedded mobile terminal,the power consumption volume and other factors need to be considered comprehensively,so CPU or GPU can not be qualified for such work.FPGA has become an important choice for convolutional neural network hardware acceleration because of its high performance and low power consumption.This thesis implements a convolutional neural network accelerator based on FPGA,which provides a general and efficient forward inference function.The specific work of this thesis is as follows:1.The common acceleration scheme is studied,and the acceleration space of convolutional neural network on FPGA is explored,and the advantages and disadvantages of each scheme are analyzed.2.A universal convolutional neural network accelerator structure is designed and implemented,which includes five parts: control,calculation,storage,I/O and operator library.In storage,the block storage and double buffer mechanism are used to realize the high-efficient reading and writing of image data;the fragmented BRAM control mode is used to improve the utilization of BRAM and enhance the parallelism of the network and the universality of the structure;in the control,the accelerator can be easily reconstructed by parameter transfer,which can meet the acceleration requirements of most networks.3.The convolution operator is optimized,and the parallel computation is realized on the input and output channel.The data reuse of convolution core parameters is carried out,and the parallelism of convolution calculation is improved.At the same time,the optimization scheme is used to search the parallel strategy of each layer,which reduces the access and storage of off-chip data and saves the read and write costs.At the same time,the three-stage pipeline is introduced,the data transmission is realized by line buffer,and the binary tree is used to realize the accumulation,and the high-performance convolution operator is obtained.4.Finally,based on the embedded development board of ZCU102,the accelerator is deployed.The experiment shows that the structure of this accelerator whtich this thesis implemented has some generality,effectively improves the reasoning performance,and also performs well in the utilization of BRAM,the real-time utilization rate reaches72.16%,the highest performance of convolution operator reaches 184.03GOP/s,and the average performance reaches 167.20GOP/s,which surpasses many existing schemes.
Keywords/Search Tags:Convolutional neural network, FPGA hardware acceleration, High-performance operator, Memory access optimization
PDF Full Text Request
Related items