Font Size: a A A

Neural Network Acceleration And Compression

Posted on:2020-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HuFull Text:PDF
GTID:2428330623456382Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,this new field of deep learning has developed rapidly in the past ten years and has attracted more and more researchers' attention and research.It has achieved very successful applications in many areas.From the initial handwritten digit recognition to image recognition,detection,tracking,and speech recognition in recent years,deep network models have achieved achievements that traditional methods cannot.As deep learning continues to break through in academia,the industry has begun to port deep learning algorithms to hardware platforms.However,when deep neural networks are deployed to specific scenarios,they encounter an inevitable problem,such as intelligence.Embedded devices such as mobile phones and wearable devices have strict requirements on model volume,computing performance,power consumption,etc,which limits the application of the above-mentioned deep neural network model with high computational performance requirements.Powerful and complex neural networks,while bringing better performance,are accompanied by high storage space and computational resource consumption,which makes deep learning algorithms difficult to port to mobile devices or embedded devices.Therefore,in the industry,The aim is to make the algorithm more stable and efficient on the hardware platform,and efficiency is the goal of its pursuit.And this is the starting point of this study.This study aims to compress the neural network model to the extent possible without loss of precision in order to port it to embedded devices.First of all,this paper proposes a structured pruning strategy based on learning.By adding a purning module in the original network model for learning the importance of each convolution kernel in the convolutional layer,the importance of each convolution kernel is learned in the pre-training process,and then the unimportant convolution kernel will be purned.Secondly,this paper proposes an improved quantization method-step-by-step quantization.The method acts on the structured pruning,that is,quantizes the network model after pruning,and further compresses the network model.The quantification method achieves the purpose of compressing the network by letting multiple weight parameters share the same parameter value,and our step-by-step quantization methodin the first step of quantification makes some abnormal points of the initial quantification misclassification to the most appropriate class.In the second step of quantification,the purpose of parameter sharing is completed.The step-by-step quantization method proposed in this paper reduces the loss of precision caused by quantization to the neural network model compared with the single-step quantization method.Finally,this paper applies our neural network compression method to face recognition and image super-resolution reconstruction.Finally,the neural network model is compressed without loss of accuracy.It also proves the neural network proposed in this paper.
Keywords/Search Tags:deep learning, neural network, compression, pruning, quantification
PDF Full Text Request
Related items