Font Size: a A A

Research On Weight-Interaction Quantization Of Deep Learning Models

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:G L XiaoFull Text:PDF
GTID:2428330611998866Subject:Electrical engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep neural network is more and more used in many fields,such as image recognition,target detection and natural language processing.However,with the increase of the number of deep neural network layers,the demand for storage capacity and computing power is also significantly increased,and it can not run on small mobile devices with limited resources.Weight quantization is an optimization method that can effectively reduce the size and computation of the depth neural network.However,due to the quantization error caused by the quantization weight,the performance of weight quantization is poor at a lower quantization bit width.In order to reduce the quantization error of the weight quantization method under low quantization bit width and improve the prediction accuracy of the quantization model,this paper proposes a weight interactive quantization algorithm.The main work and results are as follows:First of all,based on the research of low-quantization related literature at home and abroad,analyze the defect that the traditional symmetric uniform quantization method cannot control the accumulation of quantization error growth,and for this defect,using the weight interactive idea,propose a 2-bit weight interactive quantization algorithm.The advantage of the algorithm in reducing the accumulated quantization error and control error boundary is theoretically analyzed,and a complete mathematical model is established.Then,for the hyperparameter search process,by analyzing the effect of the quantization errors of different convolutional layers on the output layer,combined with the idea of greedy algorithm,a layer-by-layer search algorithm is proposed,and the brute force search is compensated by searching for the local optimal solution to approximate the global optimal solution.Finally,on the GPU software platform and the FPGA hardware platform,the performance of the weight interactive quantization algorithm is verified by experiments: First,it is proved through a comparative experiment that the weight interactive quantization algorithm can effectively reduce the accumulated channel error and control error boundary.Then through the domestic and foreign,the comparison of the experimental results of the low-level quantization algorithm proves that the weight interactive quantization algorithm effectively reduces the loss of prediction results that is generated by the weight quantization.Finally,on the PYNQ-z2 hardware platform,it is proved that the weight interactive quantization algorithm can exchange a small prediction result loss for a significant improvement in calculation efficiency.
Keywords/Search Tags:deep learning, ternarization, model compression, model acceleration, hyper-parameter search
PDF Full Text Request
Related items