Font Size: a A A

Research And Implementation Of The Int8 Quantization Method Based On K-L Divergence

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y QianFull Text:PDF
GTID:2428330602477687Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
We are in the period of rapid development of the Internet.Computer science plays an increasingly important role in life.Artificial intelligence(AI),as the most popular branch of computer science in recent years,has gradually entered people's daily life,Such as intelligent robots,speech recognition,image recognition,natural language processing,etc.Convolutional neural networks(referred to as neural networks for short),as an important research method for deep learning,have excellent feature extraction capabilities and anti-noise capabilities.At the same time,due to the complexity of the neural network,it has very high requirements on data volume,computing power and bandwidth.In order to solve the above-mentioned problems,researchers have made many special optimizations for neural networks.The optimization methods include compression,coding and quantization.As a commonly used optimization method,quantization has achieved good results in most neural networks.The main work of this thesis is divided into the following two aspects:(1)In order to solve the accuracy loss problem of the quantization model,based on the traditional int8 quantization method,an int8 quantization method based on K-L(Kullback-Leibler divergence)divergence is designed.Compared with the traditional quantitative model,the int8 model quantified by K-L scatter has higher model accuracy,and is suitable for high-precision scenarios such as "AI medical","AI translation",and "target recognition";(2)On the basis of understanding the implementation principle of neural network execution framework,this thesis expands the function of the framework.According to the idea of software design,a quantization module is implemented in the framework,which is specifically used to quantify the float32 model.Through this module,users can easily convert float32 model to int8 model.Within the quantization module,it contains an online quantization module and an offline quantization module.For the online quantization module,the user can directly run the model without any modification to the code to verify the result of int8 quantization.For the offline quantization module,this article provides a complete set of offline quantization schemes,through which users can directly generate offline models.Ideally,the offline model can run independently without relying on any framework code.At the same time,users can directly integrate the offline model into the application without considering the difference between the framework and the production environment,which greatly simplifies the deployment and application of the model.The int8 quantization method and quantization module in this thesis are a complete set of neural network quantization schemes,which are of practical value and significance for giving play to the characteristics of AI processors with low power consumption and high concurrent computing.
Keywords/Search Tags:neural network execution framework, neural network, quantization method, K-L divergence, AI processor
PDF Full Text Request
Related items