Font Size: a A A

Design And Implementation Of Convolutional Neural Network Accelerator Based On Affine Quantization

Posted on:2020-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:C L ZengFull Text:PDF
GTID:2518306518963679Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In recent years,convolutional neural networks(CNNs)have been widely used in speech recognition,object detection,and image segmentation.With the rapid development of CNN algorithms,for large-scale CNNs,the computational-intensive and memory-intensive features have brought many challenges to the application of CNN.At present,the CNNs are mainly deployed in the cloud server.The terminal data need to be transmitted to the server for processing,which causes high power consumption and high latency.In order to solve these problems,FPGA-based CNN accelerators have gradually become a research hotspot.However,FPGA platforms are limited by on-chip resources and off-chip memory bandwidth.With limited resources,it is of great significance to compress the CNN models in order to implement high performance CNN accelerators.Firstly,this work analyzes the theory of affine quantization and applies affine quantization to the CNN inference process.According to the reason of the loss of data precision caused by affine quantization,different methods for obtaining the quantization parameters are proposed.The CNN quantitation inference is realized based on Tensorflow.We analyze the influence of different quantization parameters and different quantization precision on the top-1 accuracy.In order to improve the accuracy of the quantized CNN,a mixed-precision quantization theme is proposed.The experimental results show that by using the appropriate quantization parameter,the activations and weights can be quantized to 8-bits with less than 1% loss in accuracy.Then,a high-performance quantized-CNN accelerator based on the Zynq-7000 series FPGA is implemented in this paper.According to the characteristics of CNN and embedded FPGA platform,the hardware and software co-operation architecture is proposed.Under the constraints of limited on-chip resources,we use DSP and LUT to implement multipliers,analyze the parallelism and performance of the accelerator,and select the appropriate degree of parallelism.For 1×1 convolution operation,the design of the multiplexing parallelism is proposed.In order to improve the utilization of DSP,we propose an optimization strategy for implementing two 8-bits multipliers by using one DSP.In order to improve the rate of blocking,a two-dimensional DMA blocking strategy is proposed.The experimental results show that the average performance of the CNN accelerator proposed in this paper can reach 416.3 GOPS,which is several times higher than previous designs based on the same FPGA platform.The performance of our design is 3.75 times higher than that of the CPU,and the energy efficiency is 1.42 times higher than that of the GPU.
Keywords/Search Tags:Convolutional neural network, Accelerator, Affine quantization, FPGA
PDF Full Text Request
Related items