Research On Deep Neural Network Compression And Acceleration Method Based On Channel Pruning

Posted on:2023-06-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:2568306614493594

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep neural networks have developed rapidly and are widely used in many fields,such as computer vision,natural language processing,speech recognition,sentiment analysis,text feature extraction,bioinformatics,and so on.In order to achieve better results,the number of layers of deep neural networks is growing faster and faster,which leads to the increasing number of parameters.Therefore,higher requirements are put forward for the computing ability and storage capacity of the computing device.With the advancement of science and technology,low-power and portable edge devices begin to appear in people’s vision,such as smart wearable devices.Due to the excellent performance of deep neural networks in various fields,people have great interest in how to implement them in edge devices.However,the poor computing ability and limited storage capacity make it difficult to apply deep neural networks on edge devices.Therefore,how to reduce the number of parameters of deep neural networks has become a hot research topic.Many deep neural network acceleration and compression techniques have been proposed to facilitate their implementation in edge devices.In previous deep neural network compression methods,the loss of accuracy is often neglected in pursuit of high compression rate.Although the accuracy can be restored to a good result after fine-tuning,the significant reduction of accuracy means that many important information is discarded during the compression process.This thesis studies how to reduce the loss of accuracy while ensuring the high compression rate of deep neural networks,including the following two parts of research.(1)We propose a channel-level deep neural network pruning method,which aims to remove the unimportant channels from the neural network,reduce the number of neural network parameters,and ensure the performance of the neural network after pruning.Specifically,in order to reduce the channel redundancy more effectively,the proposed method introduces a K-order statistic in the Batch Normalization layer and identifies the importance of channels by the accumulation of scaling factors in Batch Normalization,and the channels corresponding to scaling factors with low cumulative values are removed to produce a neural network with a low number of parameters,and the accuracy of the compressed network is restored by fine-tuning.Experimental results on two datasets,CIFAR-10 and CIFAR-100,show that the proposed method reduces the loss of accuracy after compression,while the compression efficiency is maintained at a high level,and higher accuracy is obtained after fine-tuning.(2)In channel pruning methods,a fixed threshold is often used as the pruning ratio,which often leads to inadequate or excessive pruning.To solve the problem of channel pruning ratio of deep neural networks,this thesis introduces the Jarque-Bera normal distribution test method after calculating the K-order statistics for the scaling factor in the Batch Normalization layer.The Jarque-Bera normal distribution test method is used to determine whether the cumulative values of the scaling factors conform to the normal distribution,so that the channel pruning ratio can be calculated and the channels corresponding to the scaling factors that do not conform to the normal distribution are pruned.In the experiments,the performance of the model is compared between the fixed threshold pruning ratio and the automatic determination of the pruning ratio using the Jarque-Bera normal distribution test method for channel pruning.The results show that the accuracy loss becomes smaller after channel pruning using the pruning ratio automatically obtained by the Jarque-Bera normal distribution test method,which means that the method can prune the redundant channels in the neural network adequately and reduce the error rate of channel pruning.In addition,we compare the training time of the model before and after pruning,and the results show that the training time of the model is reduced after the automatic determination of the pruning ratio for channel pruning using the Jarque-Bera normal distribution test method,which can accelerate the model training process.The experimental results show that,compared with the current mainstream methods,the proposed method in this thesis has the features of high accuracy after pruning,good accuracy recovery after fine-tuning,and short training time with little difference in parameter compression ratio,thus making it possible to apply deep neural networks on resource-constrained embedded platforms.Finally,this thesis deploys a small neural network on an STM32F103 RC embedded device and conducts migration experiments on the Iris dataset,laying the foundation for future deployment of larger neural networks on embedded devices.

Keywords/Search Tags:

K-order statistics, Neural network compression, Deep neural network, Channel pruning, Normal distribution test

PDF Full Text Request

Related items

1	Research On Deep Neural Network Compression And Acceleration Based On Channel Pruning
2	Research On Deep Neural Network Compression Method Based On Soft Masks Learning
3	Similarity-Based Approach To Neural Network Pruning
4	Research On Deep Neural Network Compression And Acceleration Based On Parameter Pruning
5	Research Of The Model Compression Algorithm For Deep Neural Network
6	A Deep Neural Network Pruning Method Based On Structural Search
7	On The Learning And Compression Of Deep Neural Network Structure
8	Research On Embedded Object Detection Based On Deep Neural Network Compression
9	The Study Of Pruning Methods Of Deep Neural Network
10	Research On Edge Computing Deep Neural Network Optimization Technology Based On Channel Pruning