Font Size: a A A

Research On Acceleration Of Low-Precision Convolutional Neural Networks On FPGA

Posted on:2020-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:D QiFull Text:PDF
GTID:2428330590483214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the theory and application technology of deep learning,traditional convolutional neural networks have achieved great success in the fields of speech recognition,image processing and natural language processing.However,the increasing computational scale and increasingly complex model structure of convolutional neural networks have become a bottleneck for their deployment on the mobile and embedded sides.Recent studies have shown that quantized convolutional neural networks can significantly reduce parameter size and computational cost.In the case of ensuring the accuracy,weight and the hidden layer activation value during the training process of the network model are binarized to +1 or-1,and the binarized weight and activation value are used to calculate the parameter gradient,such binary quantization operation reduces the memory consumption theoretically to 1/32 of the full precision model.More importantly,operations of Xnor and popcount shift can replace the original convolution operation,which greatly reduces the computational time.This paper combines FPGA programmable,reconfigurable and low-power features and binarized training methods to accelerate the implementation of VGG16-based improved network on Xilinx PYNQ-Z1 lightweight development board through Vivado HLS advanced synthesis tool.Corresponding optimizations have been made from the convolutional layer,the pooling layer,the regular normalization layer,and the fully connected layer.A Matrix-Vector Unit is designed to control the number of PEs and the number of SIMD channels,the model can achieve the best local performance and finally obtain the overall optimal performance.Through optimization,it has higher data throughput,faster processing speed and lower power consumption than previous people.At the same time,the performance comparison of several different quantitative combinations is provided.The results show that the higher the quantization precision,the higher the recognition accuracy,but the larger the memory required by the model.Finally,using the acceleration scheme and quantitative thinking of this paper,an object detection network based on Squeezenet is implemented,and 85.7% accuracy,31.8FPS and 2.4W power consumption are obtained.Later,according to the scheme of this paper,a more complex network model can be deployed on different FPGAs and even on the ARM side and the mobile GPU side.
Keywords/Search Tags:Convolutional Neural Network, FPGA, Binarized Quantization, Matrix-Vector Unit
PDF Full Text Request
Related items