Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network

Posted on:2021-10-29

Degree:Master

Type:Thesis

Country:China

Candidate:H X Fan

Full Text:PDF

GTID:2518306467476404

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Deep Convolutional Neural Network is widely used in computer vision,such as image classification,object detection and semantic segmentation.Compared with traditional machine learning algorithms,deep learning is often superior in performance.Its success is largely attributed to the rapid development of computing resources.Hence,most deep convolutional neural networks require training and inferring on GPU.However,the high power consumption of GPU limits the deployment and application of deep convolutional neural networks in the context of edge computing.Therefore,the work of compressing and accelerating the convolutional neural network has been continuously proposed,which aims to deploy the neural network on some embedded devices with limited resources by reducing the storage capacity of the neural network,reducing the computational complexity,and saving hardware resources.This paper mainly focuses on the deep academic research problem of quantization algorithm of deep convolutional neural network.From the perspective of hardware FPGA,it completes the low bit-width quantization algorithm and makes it possible to be deployed and applied on FPGA.The main content of the paper is as follows:(1)Facing the acceleration of the general-purpose processor neural network inference operation,this paper analyzes the problem of serious reduction of neural network inference accuracy under low bit-width quantization,and proposes a neural network model quantization method based on probability distribution correction.This method normalizes the network weights by introducing a nonlinear transformation,which makes the corrected model weights distribution more concentrated.Then,the corrected weights are quantized into fixed-point data format,which not only can realize weights sharing and reduce storage space,but also can save computing resources and accelerate computing on FPGA.Finally,a reasonable quantization result is obtained by adding a scaling factor to the weights.This method further improves the classification performance of low bit-width network.(2)For a special processor using ABM-Sparse convolution algorithm,a quantization method of non-uniform neural network model based on weights clustering is proposed.First,this method clusters weights according to the convolutional layer or the convolution kernel through the K-Means++ clustering algorithm to realize the weights sharing compression network,so as to reduce the times of multiplication and accelerate the hardware operation;then the dynamic fixed-point number is used to quantize the clustered full-precision network weights and activation to 8-bit width to achieve the low bit-width network,which can reduce the memory space and save hardware resources.Experimental results show that the algorithm is simple and effective,and the model size and computation can be compressed at high times under the condition of ensuring the classification accuracy of the neural network,so that the neural network can be easily deployed on the FPGA to accelerate the calculation.(3)Aiming at the network model with variable precision arithmetic unit and supporting dynamic variable bit precision,a mixed-precision neural network model quantization method based on genetic algorithm is proposed.The deep convolutional neural network quantization problem is transformed into a multi-objective optimization solution problem.The optimal bit width of each layer is automatically searched by genetic algorithm,and then the mixed precision quantization of weights and activations is carried out to achieve the low bit-width quantization.The entire method is automatic and efficient,achieving a higher compression effect.Experimental results show that the quantization algorithm proposed in this paper can effectively compress the network weights and activations to an average bit-width of 2 bits,and the compression result is better than the existing quantization methods.

Keywords/Search Tags:

Deep Convolutional Neural Network, Model compression, Weights quantization, Mixed precision quantization

PDF Full Text Request

Related items

1	Mixed-precision Quantization Methods For Convolutional Neural Network Compression
2	Study On Convolutional Neural Network Compression Methods Based On Pruning And Quantization
3	Research On Deep Learning Model Quantization And Related Compression Technologies
4	Research On Model Compression Method Of Deep Convolution Neural Network
5	Deep Neural Network Compression Method Based On Product Quantization
6	Research On Application Of Neural Network Compression And Acceleration Based On Quantization
7	Research On Accelerating Algorithm Of Neural Network Based On Quantization
8	Research And Improvement Of Model Compression Method Based On Deep Convolutional Neural Network
9	Research On Compression And Acceleration For Deep Convolutional Network Model
10	Deep Convolutional Neural Networks Compression Based On Sparsity And Quantization