Research And Implementation Of The Int8 Quantization Method Based On K-L Divergence

Posted on:2021-04-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y Qian

Full Text:PDF

GTID:2428330602477687

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

We are in the period of rapid development of the Internet.Computer science plays an increasingly important role in life.Artificial intelligence(AI),as the most popular branch of computer science in recent years,has gradually entered people's daily life,Such as intelligent robots,speech recognition,image recognition,natural language processing,etc.Convolutional neural networks(referred to as neural networks for short),as an important research method for deep learning,have excellent feature extraction capabilities and anti-noise capabilities.At the same time,due to the complexity of the neural network,it has very high requirements on data volume,computing power and bandwidth.In order to solve the above-mentioned problems,researchers have made many special optimizations for neural networks.The optimization methods include compression,coding and quantization.As a commonly used optimization method,quantization has achieved good results in most neural networks.The main work of this thesis is divided into the following two aspects:(1)In order to solve the accuracy loss problem of the quantization model,based on the traditional int8 quantization method,an int8 quantization method based on K-L(Kullback-Leibler divergence)divergence is designed.Compared with the traditional quantitative model,the int8 model quantified by K-L scatter has higher model accuracy,and is suitable for high-precision scenarios such as "AI medical","AI translation",and "target recognition";(2)On the basis of understanding the implementation principle of neural network execution framework,this thesis expands the function of the framework.According to the idea of software design,a quantization module is implemented in the framework,which is specifically used to quantify the float32 model.Through this module,users can easily convert float32 model to int8 model.Within the quantization module,it contains an online quantization module and an offline quantization module.For the online quantization module,the user can directly run the model without any modification to the code to verify the result of int8 quantization.For the offline quantization module,this article provides a complete set of offline quantization schemes,through which users can directly generate offline models.Ideally,the offline model can run independently without relying on any framework code.At the same time,users can directly integrate the offline model into the application without considering the difference between the framework and the production environment,which greatly simplifies the deployment and application of the model.The int8 quantization method and quantization module in this thesis are a complete set of neural network quantization schemes,which are of practical value and significance for giving play to the characteristics of AI processors with low power consumption and high concurrent computing.

Keywords/Search Tags:

neural network execution framework, neural network, quantization method, K-L divergence, AI processor

PDF Full Text Request

Related items

1	Research On The Automated Software Development Environment Of Programmable Neural Network Processor
2	The Research Of Neural Network Processor Based On STDP
3	Research On Model Compression And Acceleration For Deep Neural Network
4	Research On Quantification Pruning And Method Of Convolutional Neural Network
5	Design And Research Of Multi-processor Spiking Neural Network Simulator
6	Research On Binary Quantization Methods Of Deep Learning Models
7	Research On Fault Diagnosis Of Process Industry Based On Neural Network
8	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
9	Research On Accelerating Algorithm Of Neural Network Based On Quantization
10	Design And Application Of Universal Programmable Neural Network Processor