Font Size: a A A

Research And Application Of Structured Model Compression Algorithm In Deep Neural Network

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:L S WuFull Text:PDF
GTID:2428330626456035Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,deep learning technologies have developed rapidly and are gradually applied in many fields such as computational vision and natural language processing.As a main research branch of deep learning,convolutional neural networks have achieved better performance than traditional methods on many tasks such as classification,detection,and segmentation.However,the success of the convolutional neural network is formed by a huge combination of parameters.The stacking of a variety of convolutional kernels ensures that it can extract rich and diverse representative features.This also shows that the operation of a convolutional neural network requires good hardware conditions to support it.This hardware limitation makes the neural network only stay in the laboratory and cannot run on mobile devices with low storage capacity and computing power.In fact,there are a lot of parameter redundancy in the network.This paper focuses on the model compression and optimization acceleration of deep neural networks.The main work is:Aiming at the problem that iterative pruning is prone to accumulate errors,a layerby-layer pruning algorithm based on sensitivity is proposed.We first consider the impact of a single convolutional layer on the overall network performance,define the concept of convolutional layer sensitivity,and quantitatively measure the sensitivity of each convolutional layer.Based on this,the order of pruning at different pruning rates will be from the low-sensitivity convolution layer to the high-sensitivity convolution layer in the current state,and the greedy pruning method is used between cross-layer pruning.This algorithm can greatly reduce the loss of the network after single-layer pruning,and the subsequent iterative pruning can also be performed at a higher level of network performance,avoiding the problem of iterative pruning error amplification.On the LeNet-5and AlexNet network models,our algorithm has improved in related indicators such as compression ratio and acceleration ratio.Aiming at the problem that pruning can easily cause filter deletion on dense network models,we propose a multi-level filter pruning algorithm based on two-dimensional image entropy.From the physical meaning of the filter,we define a filter importance evaluation standard based on the output feature map two-dimensional image entropy,and quantitatively measure the feature extraction capability of each filter.Compared with the evaluation criteria such as filter norm and parameter sparsity,the evaluation criterion of output entropy can be accurate,and the obtained evaluation index is more distinguishable.In addition,in order to avoid the occurrence of accidental deletion of dense networks with low redundancy,we adopted a flexible pruning algorithm so that the accidentally deleted filter can recover its critical parameters during the model reconstruction stage to make the model recoverable.Finally,we verified the experimental comparison of the multi-level pruning algorithm with other related algorithms on a typical network model.It can reach a pruning rate of more than 60% on LeNet-5,and the recognition rate does not drop more than1.00%.On VGG-16,it can achieve 78.5% parameter reduction and 54.5% floating-point operation reduction,and can effectively reduce the memory footprint by 32.1%.In addition to the model pruning algorithm,we use a K-means clustering back-end quantization algorithm to further compress the pruned model,further reducing the local storage space by 75.0%.
Keywords/Search Tags:convolution sensitivity, greedy pruning, image entropy, soft pruning, mean quantization
PDF Full Text Request
Related items