Font Size: a A A

Optimal Design Of Sparse Convolution Neural Network Inference Accelerator Based On FPGA

Posted on:2023-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZengFull Text:PDF
GTID:2558307100475454Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
As one of the hotspots of artificial intelligence research,convolutional neural network has the ability to learn features from big data and generalize the results to the same type of unknown data.It performs well in the fields of image recognition and image segmentation.However,convolutional neural networks often face the thorny problems of large model scale and high computational complexity,which limits its practical application to a certain extent.With the increasing complexity of convolutional neural network model,the amount of redundant calculation is also increasing.Most of these redundant operations come from the operation with activation or weight of zero.Making rational use of these activation redundancy can effectively reduce the amount of operation without reducing the accuracy of the model.FPGA as a hardware carrier with high parallelism to effectively accelerate the inference of convolutional neural network has attracted much attention in the industry.However,there are still two problems to be solved to accelerate convolution neural network on FPGA: The sparse activation is not reasonably utilized,and the calculation efficiency is low due to the influence of load imbalance in the operation of sparse structure.The research work of this thesis focuses on the optimization design of sparse convolutional neural network accelerator architecture.The specific contents are as follows:(1)An efficient sparse convolution neural network data flow and its data layout is designed.The coarse-grained control of data flow designed in this thesis is realized by dense data flow,while the fine-grained control is realized by sparse data flow.This strategy can greatly reduce the resource consumption caused by data flow and realize the skip of zero activation.Based on this data flow,a data layout format is designed,which reduces the access delay of off-chip data by ensuring a long burst length.(2)A convolutional neural network accelerator architecture supporting sparse activation is designed.The multiplication and accumulation array is used to realize high data reuse with input channel parallelism and output channel parallelism.The architecture has an intelligent data distributor,which can allocate balanced computing load for the computing unit,and reduce the utilization of computing resources to reduce power consumption without reducing performance.In addition,the adder tree is also optimized to reduce the load imbalance caused by sparse operation.The experimental results show that the operation speed is increased to 2.5 ×,and more than 97% of the operations are non-zero activation operations.(3)A sparse convolutional neural network accelerator architecture with low offchip bandwidth dependence is designed.The data loopback strategy is used to reduce the number of off-chip activation data access.This strategy makes full use of the characteristics of adjacent convolution layers,and does not use off-chip memory to temporarily store the results of the inter layer,so as to reduce the access of activation data.The input channel expansion technology is used,which increases the operation efficiency of the accelerator.The experimental results show that the operational performance of the accelerator is up to 315.8 GOP / s.(4)A sparse convolutional neural network accelerator architecture with flexible parallelism is designed.The parallelism dimension is extended to output channel parallelism,input channel parallelism and output activation parallelism,and the parallelism of the latter two dimensions can be flexibly configured according to the operation structure to reduce the impact of network fragmentation.This thesis focuses on the data dependence of output activation parallelism,and a data allocation strategy is designed to reduce the data access of off-chip memory.The experimental results show that its simulation computing performance is as high as 325.8 GOP/s.
Keywords/Search Tags:Convolutional neural network, FPGA, Hardware accelerator, Sparse operation
PDF Full Text Request
Related items