Font Size: a A A

Neural Network Compression And Acceleration For FPGA Implementation

Posted on:2021-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2518306551952649Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
As deep learning tasks become more complex,the corresponding Convolutional Neural Network(CNN)models have become more bloated and difficult to run in real-time on embedded hardware.The method of ”thinning down” a model by reducing the parameters of CNNs is often referred to as network compression.In recent years,methods such as pruning,quantization,and coding have been proposed for research to ensure that neural networks become lightweight.Although the pruning and coding can greatly reduce the parameters of network,they cannot speed up the calculation of a neural network.The quantization method can both speed up calculation and reduce parameter capacity,which provides a good carrier for FPGA hardware implementation.The key of the quantization method needs to solve two problems: one is the problem of gradient vanishing when training the Quantized Neural Network(QNN);the other is to achieve comparable performance with the original CNN.To solve these problems,this paper proposes an estimator to train QNNs,and uses Xilinx ZCU102 FPGA to implement the forward calculation of the QNN.The main innovations of this article are as follows,1.Propose a method of quantizing CNNs.From the perspective of probability and statistics,this paper recalculates the gradient,and finds that the expectation of the gradient is not zero to avoid the problem of gradient vanishing.In other words,during the training process,the loss function can decrease like the full-precision CNNs until it converges.2.Propose an FPGA implementation structure of QNNs.On the one hand,the burden of FPGA memory is reduced through parameter storage and interaction on QNNs.On the other hand,according to the particularity of quantized weights,the multiplication and accumulation(MAC)of the convolution calculation can be replaced with XOR or shift operation,thereby saving computational resources and improving parallelism degree.On this basis,this paper optimizes the computing process and network scale,and redesigns the circuit structure of each module of the convolution network.
Keywords/Search Tags:Deep learning, network compression, Quantized Neural Network, FPGA
PDF Full Text Request
Related items