Neural Network Compression And Acceleration For FPGA Implementation

Posted on:2021-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:J Chen

Full Text:PDF

GTID:2518306551952649

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

As deep learning tasks become more complex,the corresponding Convolutional Neural Network(CNN)models have become more bloated and difficult to run in real-time on embedded hardware.The method of �thinning down� a model by reducing the parameters of CNNs is often referred to as network compression.In recent years,methods such as pruning,quantization,and coding have been proposed for research to ensure that neural networks become lightweight.Although the pruning and coding can greatly reduce the parameters of network,they cannot speed up the calculation of a neural network.The quantization method can both speed up calculation and reduce parameter capacity,which provides a good carrier for FPGA hardware implementation.The key of the quantization method needs to solve two problems: one is the problem of gradient vanishing when training the Quantized Neural Network(QNN);the other is to achieve comparable performance with the original CNN.To solve these problems,this paper proposes an estimator to train QNNs,and uses Xilinx ZCU102 FPGA to implement the forward calculation of the QNN.The main innovations of this article are as follows,1.Propose a method of quantizing CNNs.From the perspective of probability and statistics,this paper recalculates the gradient,and finds that the expectation of the gradient is not zero to avoid the problem of gradient vanishing.In other words,during the training process,the loss function can decrease like the full-precision CNNs until it converges.2.Propose an FPGA implementation structure of QNNs.On the one hand,the burden of FPGA memory is reduced through parameter storage and interaction on QNNs.On the other hand,according to the particularity of quantized weights,the multiplication and accumulation(MAC)of the convolution calculation can be replaced with XOR or shift operation,thereby saving computational resources and improving parallelism degree.On this basis,this paper optimizes the computing process and network scale,and redesigns the circuit structure of each module of the convolution network.

Keywords/Search Tags:

Deep learning, network compression, Quantized Neural Network, FPGA

PDF Full Text Request

Related items

1	On The Learning And Compression Of Deep Neural Network Structure
2	Research On Algorithms Of Compressing Convolutional Neural Networks Based On Deep Compression
3	Model Compression Based On Convolution Neural Networks
4	Research On Embedded Object Detection Based On Deep Neural Network Compression
5	Design And Implementation Of Feedforward Neural Network And Particle Swarm Optimization Based On FPGA
6	Research On Image Compression Algorithm Based On Deep Learning
7	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
8	Research On Image Compression Method Based On Deep Convolutional Neural Network
9	Research And Application Of Deep Model Compression Technology
10	Research On Binary Quantization Methods Of Deep Learning Models