| Artificial intelligence is leading the trend of contemporary technological development.It has not only been widely used in areas such as autonomous driving,face recognition,machine translation,and search recommendations,but has also made breakthroughs in content generation,language dialogue,and other fields that have attracted worldwide attention.As the core part of artificial intelligence,the deep neural network plays a vital role in it.However,in order to achieve better performance,deep neural networks are becoming increasingly deeper,with a growing number of parameters and more complex structures,which hinders their deployment on resource-constrained hardware such as smartphones and wearable devices.How to effectively compress the deep neural network is the key to promoting the application of artificial intelligence,which has attracted the attention of many scholars and scientific and technological personnel.This thesis conducts research on deep neural network compression from two aspects of network pruning and parameter quantization,and proposes corresponding solutions to the problems existing in traditional network pruning methods and parameter quantization methods.The main research work and innovation of this thesis are summarized as follows:1.Aiming at the problem that the traditional pruning method usually ignores the collaborative relationship between the corresponding filters of the consecutive convolutional layers and the hand-designed pruning criteria cannot avoid human interference,we propose consecutive layer collaborative filter similarity for network pruning.Considering that the filter of the current layer has a natural synergistic relationship with the corresponding filter of the next layer,the consecutive layer collaborative filter is constructed to make full use of the complete filter information,and the cosine similarity is used to calculate the similarity matrix of the collaborative filter.To exploit the similarity information from a global perspective,linear layers are introduced to learn binary selection vectors to automatically prune or preserve filters.To make the whole pruning framework fully differentiable,a piecewise polynomial function is designed to approximate the gradient of the activation function,and an efficiency constraint is introduced into the network loss to optimize the pruning process in the form of gradient descent,so that the pruning results can achieve a better balance between accuracy and efficiency.2.To solve the problem that traditional quantization-aware training methods usually do not consider the importance differences of different weight parameters and perform indiscriminate quantization on all parameters at the same time,an incremental parameter quantization method is proposed.Refer to the effective filter importance evaluation criteria in the network pruning method,combine the filter weight and the scaling factor in the batch normalization layer to comprehensively evaluate the importance of different weight parameters in the network,and divide them into important parts and non-important parts accordingly.To make the entire quantization process more stable,quantization is divided into multiple stages.In the current stage,important parameters are quantified and fixed,while non-important parameters participate in network retraining and update,and the next stage continues to perform parameter importance evaluation and division.Repeat the three steps of "division-quantization-retraining" to incrementally quantize all weight parameters in the network,reducing the cumulative quantization error while maintaining network performance.3.This thesis conducts a series of experiments on the CIFAR-10,CIFAR-100,and ImageNet datasets using VGG-16,ResNet-20,ResNet-56,ResNet-110,and ResNet-50 deep neural network architectures.The superiority of the compression methods proposed in this thesis is proved by comparing them with several current advanced compression methods.In addition,this thesis also conducted a large number of ablation experiments and discussion analyses to verify the effectiveness of the proposed methods. |