Research On Acceleration Of Low-Precision Convolutional Neural Networks On FPGA

Posted on:2020-10-05

Degree:Master

Type:Thesis

Country:China

Candidate:D Qi

Full Text:PDF

GTID:2428330590483214

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the theory and application technology of deep learning,traditional convolutional neural networks have achieved great success in the fields of speech recognition,image processing and natural language processing.However,the increasing computational scale and increasingly complex model structure of convolutional neural networks have become a bottleneck for their deployment on the mobile and embedded sides.Recent studies have shown that quantized convolutional neural networks can significantly reduce parameter size and computational cost.In the case of ensuring the accuracy,weight and the hidden layer activation value during the training process of the network model are binarized to +1 or-1,and the binarized weight and activation value are used to calculate the parameter gradient,such binary quantization operation reduces the memory consumption theoretically to 1/32 of the full precision model.More importantly,operations of Xnor and popcount shift can replace the original convolution operation,which greatly reduces the computational time.This paper combines FPGA programmable,reconfigurable and low-power features and binarized training methods to accelerate the implementation of VGG16-based improved network on Xilinx PYNQ-Z1 lightweight development board through Vivado HLS advanced synthesis tool.Corresponding optimizations have been made from the convolutional layer,the pooling layer,the regular normalization layer,and the fully connected layer.A Matrix-Vector Unit is designed to control the number of PEs and the number of SIMD channels,the model can achieve the best local performance and finally obtain the overall optimal performance.Through optimization,it has higher data throughput,faster processing speed and lower power consumption than previous people.At the same time,the performance comparison of several different quantitative combinations is provided.The results show that the higher the quantization precision,the higher the recognition accuracy,but the larger the memory required by the model.Finally,using the acceleration scheme and quantitative thinking of this paper,an object detection network based on Squeezenet is implemented,and 85.7% accuracy,31.8FPS and 2.4W power consumption are obtained.Later,according to the scheme of this paper,a more complex network model can be deployed on different FPGAs and even on the ARM side and the mobile GPU side.

Keywords/Search Tags:

Convolutional Neural Network, FPGA, Binarized Quantization, Matrix-Vector Unit

PDF Full Text Request

Related items

1	Research Of Binarized Convolutional Neural Network And Its Hardware Design
2	Facial Expression Recognition Methods Based On Deep Learning
3	Quantitative Research On Convolutional Neural Networks And FPGA Implementation
4	Research And Design Of Key Technology Of FPGA-based Convolutional Neural Network Accelerator
5	The Algorithm Design And FPGA Verification Of Face Detection And Recogniton Based On 8Bit Quantization Neural Network
6	Research On Dynamic Quantization Algorithm Of Convolutional Neural Networks And Its Parallel Computing Structure
7	Design And Implementation Of Convolutional Neural Network Accelerator Based On Affine Quantization
8	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
9	Research On Convolutional Neural Network Acceleration
10	Research On Algorithm Of Convolutional Neural Network Suitable For Engineering Implementation