Research On Weight-Interaction Quantization Of Deep Learning Models

Posted on:2021-04-25

Degree:Master

Type:Thesis

Country:China

Candidate:G L Xiao

Full Text:PDF

GTID:2428330611998866

Subject:Electrical engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural network is more and more used in many fields,such as image recognition,target detection and natural language processing.However,with the increase of the number of deep neural network layers,the demand for storage capacity and computing power is also significantly increased,and it can not run on small mobile devices with limited resources.Weight quantization is an optimization method that can effectively reduce the size and computation of the depth neural network.However,due to the quantization error caused by the quantization weight,the performance of weight quantization is poor at a lower quantization bit width.In order to reduce the quantization error of the weight quantization method under low quantization bit width and improve the prediction accuracy of the quantization model,this paper proposes a weight interactive quantization algorithm.The main work and results are as follows:First of all,based on the research of low-quantization related literature at home and abroad,analyze the defect that the traditional symmetric uniform quantization method cannot control the accumulation of quantization error growth,and for this defect,using the weight interactive idea,propose a 2-bit weight interactive quantization algorithm.The advantage of the algorithm in reducing the accumulated quantization error and control error boundary is theoretically analyzed,and a complete mathematical model is established.Then,for the hyperparameter search process,by analyzing the effect of the quantization errors of different convolutional layers on the output layer,combined with the idea of greedy algorithm,a layer-by-layer search algorithm is proposed,and the brute force search is compensated by searching for the local optimal solution to approximate the global optimal solution.Finally,on the GPU software platform and the FPGA hardware platform,the performance of the weight interactive quantization algorithm is verified by experiments: First,it is proved through a comparative experiment that the weight interactive quantization algorithm can effectively reduce the accumulated channel error and control error boundary.Then through the domestic and foreign,the comparison of the experimental results of the low-level quantization algorithm proves that the weight interactive quantization algorithm effectively reduces the loss of prediction results that is generated by the weight quantization.Finally,on the PYNQ-z2 hardware platform,it is proved that the weight interactive quantization algorithm can exchange a small prediction result loss for a significant improvement in calculation efficiency.

Keywords/Search Tags:

deep learning, ternarization, model compression, model acceleration, hyper-parameter search

PDF Full Text Request

Related items

1	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
2	Simplification Of Deep Models:Storage Compression And Computational Acceleration
3	Research On Model Compression And Acceleration Based On Network Growth Method
4	Research On The Deep Learning Based Gesture Recognition,Hand Detection And Model Compression
5	Research On Model Compression And Acceleration For Convolutional Neural Network Under Resource-constrained Scenarios
6	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design
7	Research On Image Compression Algorithm Based On Deep Learning
8	A Study Of Model Compression Approaches To Deep Learning-based Sequence Models
9	Research On Compression And Acceleration Of Deep Convolutional Neural Networks
10	Algorithm Research On Character Recognition And Model Acceleration Of Natural Scene Based On Deep Learning