Compression And Acceleration Method Of Neural Network For Lightweight And High Energy Efficiency

Posted on:2024-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:B S Liang

Full Text:PDF

GTID:2568306914465694

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural network technology has developed rapidly and made world-renowned achievements in research and application fields.With the deepening of research and application,the scale of deep neural networks has been expanding to obtain better model accuracy and generalization ability.The dramatic increase in computation and the number of parameters brings challenges to developing deep neural networks,such as increased energy consumption,limited types of deployed devices,and the inability to meet real-time requirements.How to compress neural network models,accelerate network inference,and reduce inference energy consumption is one of the current hot issues in academia and industry.LSTM,a deep neural network for processing sequential data,is widely used in natural language processing,speech recognition,and other fields.In this paper,we take LSTM,one of the deep neural networks,as the research object of model compression and inference acceleration.We hope to achieve high-performance and energy-efficient neural network inference by lightening the network and building a heterogeneous computing platform.This paper introduces a knowledge distillation-based pruned model accuracy recovery strategy for LSTM weight pruning.Based on this,a compression method based on the original model and a compression method based on the BERT model is proposed.The experimental results show that in both fine-grained pruning and coarse-grained pruning,under the same sparsity,our compressed model accuracy is higher than that obtained by fine-tuning.In the experiments,this paper combines coarsegrained pruning,model accuracy recovery strategy,and quantization to compress BiLSTM.The experimental results show that our compressed model can achieve a speedup of 2.3 times and an energy efficiency improvement of 2.8 times compared to the original model on the GPU platform without any loss of accuracy.To implement high-performance LSTM inference on FPGA,this paper introduces row-balanced pruning into the compression of LSTM and designs a storage format CBSR and matrix-vector multiplication based on the row-balanced sparsity.The LSTM accelerator is optimized using fixedpoint quantization and reconstructed activation functions.Experimental results show that these optimization methods effectively reduce the FPGA resource utilization and LSTM inference time.Based on this,this paper combines the LSTM accelerator and CPU to build a heterogeneous computing platform to provide computational support for LSTM-based algorithms.Experimental results show that our computing platform can achieve a 2.07 times speedup and 7.6 times energy efficiency improvement compared to GPU and 7.5 times speedup and 17.7 times energy efficiency improvement compared to CPU without losing model accuracy.Also,the experimental results validate the rationality of building a heterogeneous computing platform.

Keywords/Search Tags:

lstm, knowledge distillation, weight pruning, fpga, heterogeneous computing

PDF Full Text Request

Related items

1	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
2	Research On LSTM Compression And FPGA Acceleration
3	Study On Robust Distillation And Pruning Methods For Defending Against Adversarial Examples
4	Research And Implementation Of Model Compression Method Based On Knowledge Distillation
5	Compression And Implementation Of YOLOv3 Hand Detection Model Based On Channel Pruning And Knowledge Distillation
6	Research On Model Distillation Via Filter Knowledge
7	Research On Pruning And Knowledge Distillation Method Of Deep Neural Network
8	The Image Crowd Counting Of Light-Weight Network Based On Knowledge Distillation
9	BERT To Bi-LSTM Knowledge Distillation For Sentiment Classification
10	Research On Network Pruning Algorithm Based On Importance Assessment