Font Size: a A A

Deep Neural Network Acceleration With Sparse Prediction Layers

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z T YaoFull Text:PDF
GTID:2428330614467668Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The ever-increasing computation cost of deep neural networks makes it imperative for real-world applications to accelerate the key steps especially the inference.In order to accelerate deep neural networks to reduce the computation cost,the research work is carried out in the following three areas: 1)A hypothesis is made that the sparse weights after pruning still retain the location information of some important elements in the output,and if the hypothesis holds,the redundancy calculation in the output can be skipped.2)Based on this hypothesis,Sparse Prediction Layer(SPL)is proposed,which reduces the Floating-point Operations(FLOPs)of deep neural networks by skipping the exact computation of some unimportant elements,including non-maximal values in the max-pooling kernels and non-positive values before the Re LU layer,whose indices are predicted by sparse weights after pruning,and which can avoid the retraining process.3)A parameter search method based on greedy strategy is proposed,which searches the network layer by layer for eligible sparsity in a specific layer order,completing the trade-off between the FLOPs reduction and the accuracy of the network model.4)For unstructured sparse output,a sparse result convolution based on sparse result matrix multiplication is proposed and an engineering optimization idea for unstructured sparse output is demonstrated.Experiments performed on the dataset ILSVRC-2012 show that the SPL can reduce the FLOPs by 68.3%,58.6% and 59.5% on AlexNet,VGG-16 and ResNet-50,respectively,with less than 1% accuracy loss and without retraining.In addition,the SPL can work on models that have already been pruned,and some network models that have been pruned by traditional pruning methods can be further reduced by more than half the amount of FLOPs by our method.Experiments with sparse results matrices confirm the feasibility of this engineering optimization idea.
Keywords/Search Tags:deep learning, neural network acceleration, network pruning, high-performance computing
PDF Full Text Request
Related items