Deep Neural Network Acceleration With Sparse Prediction Layers

Posted on:2021-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Yao

Full Text:PDF

GTID:2428330614467668

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

The ever-increasing computation cost of deep neural networks makes it imperative for real-world applications to accelerate the key steps especially the inference.In order to accelerate deep neural networks to reduce the computation cost,the research work is carried out in the following three areas: 1)A hypothesis is made that the sparse weights after pruning still retain the location information of some important elements in the output,and if the hypothesis holds,the redundancy calculation in the output can be skipped.2)Based on this hypothesis,Sparse Prediction Layer(SPL)is proposed,which reduces the Floating-point Operations(FLOPs)of deep neural networks by skipping the exact computation of some unimportant elements,including non-maximal values in the max-pooling kernels and non-positive values before the Re LU layer,whose indices are predicted by sparse weights after pruning,and which can avoid the retraining process.3)A parameter search method based on greedy strategy is proposed,which searches the network layer by layer for eligible sparsity in a specific layer order,completing the trade-off between the FLOPs reduction and the accuracy of the network model.4)For unstructured sparse output,a sparse result convolution based on sparse result matrix multiplication is proposed and an engineering optimization idea for unstructured sparse output is demonstrated.Experiments performed on the dataset ILSVRC-2012 show that the SPL can reduce the FLOPs by 68.3%,58.6% and 59.5% on AlexNet,VGG-16 and ResNet-50,respectively,with less than 1% accuracy loss and without retraining.In addition,the SPL can work on models that have already been pruned,and some network models that have been pruned by traditional pruning methods can be further reduced by more than half the amount of FLOPs by our method.Experiments with sparse results matrices confirm the feasibility of this engineering optimization idea.

Keywords/Search Tags:

deep learning, neural network acceleration, network pruning, high-performance computing

PDF Full Text Request

Related items

1	The Study Of Pruning Methods Of Deep Neural Network
2	Similarity-Based Approach To Neural Network Pruning
3	Research On Convolutional Neural Network Acceleration
4	Acceleration,Compression And Evaluation Methods On Deep Neural Networks
5	Research On Deep Neural Network Training Acceleration Strategies With Data Parallelization
6	Research On Compression And Acceleration Of Deep Neural Network Based On Model Pruning
7	Research On Acceleration Method Of Deep Convolutional Neural Network Based On Heterogeneous Computing Platform
8	Research On Deep Neural Network Model Compression Method Based On Parameter Pruning
9	Study On Acceleration Of Deep Convolutional Neural Network With Pruning
10	High Performance Artificial Intelligence Computing With Algorithm-hardware Co-design