Font Size: a A A

Research On Compression Technology In Deep Neural Network Implementation

Posted on:2021-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhuoFull Text:PDF
GTID:2428330614453598Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of neural network technology,the scale of neural network design is getting larger and larger,the number of layers is more and more,and the structure is more and more complicated,which makes the parameters and calculation of neural network increase dramatically.In order to meet the needs of people to apply high-performance neural network algorithms in daily life,how to deploy neural networks in mobile devices or portable devices such as embedded devices has become a new challenge.Because such devices have limited computing and storage capabilities and cannot run overly complex neural network algorithms,how to reduce the size and computational volume of complex neural networks is a problem that neural network deployment must face.This paper mainly studies the implementation and optimization of neural networks on devices with limited computing capabilities,such as mobile devices.In order to reduce the size of the model's storage and reduce the amount of calculation required for model implementation,this paper designs an information-based bottleneck Theoretical neural network hybrid compression scheme.The research explores:1)A parameter importance measurement method based on information bottleneck theory is proposed.In order to better find unimportant parameters,the method uses the information bottleneck theory and introduces the variational inference method of Bayesian deep learning to derive a loss function based on the information bottleneck theory.This loss function is used as an additional The restriction term is added to the loss function of the original neural network,and the redundant information in the neural network can be found,and the redundant information is gathered into a part of the neurons in a directional manner through training iterations,and finally this part of the redundant neurons is collected.Delete to reduce the amount of parameters.Experiments show that compared with other similar methods,the pruning method based on the information bottleneck theory proposed in this paper can remove redundant information between adjacent hidden layers in the neural network,and reduce the amount of parameters and calculation of the model.And can get higher accuracy.2)This paper combines pruning and quantization methods.Studies have shown that high-precision parameters are not necessary for the high performance of neural networks.Because the pruning method only reduces the number of neural network parameters and does not reduce the accuracy of the weight parameters,the compression scheme in this paper has further improved on this point,combining the advantages of both,to reduce the original 32-bit neural network parameters to 2 bits,thereby reducing the storage space required for the parameters so that the neural network can be stored in devices with limited storage space.Experiments show that the combination of pruning and quantization methods can greatly reduce the storage space required by the model with a small loss of model performance.
Keywords/Search Tags:Neural network, Pruning, Information bottleneck, Model compression
PDF Full Text Request
Related items