Study On Model Compression Algorithm Based On Target Recognition Deep Network

Posted on:2022-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2518306485956479

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

The computing and storage resources that the deep convolutional neural network(CNN)relies on severely restrict its deployment on a limited resource embedded platform.Pruning reduces the width or depth of the network by pruning the redundant convolution kernel in the network,reduces the network parameters,and accelerates the forward inferencing of the network,as one of the most widely used methods of model compression.The paper studies the model compression algorithm of the target recognition deep network,considers the two different levels of network width and network depth,solves the one-sided problem of a single parameter measurement standard in the pruning process and the measurement of convolutional layer similarity,obtains a faster and more accurate lightweight model.Aiming at the one-sided problem of a single parameter measurement standard,the paper proposes an integrated pruning algorithm based on sensitivity and applies the algorithm in practice.Different networks and data sets have different responses to various pruning algorithms.This article discusses the effectiveness of three parameter metrics,adds the importance numbers of the three groups of convolution filters obtained according to them as scores.The accuracy loss of the convolutional layer reduced by the same proportion is used as the sensitivity of this layer.The number of convolution filters that should be cut for each layer is calculated according to the sensitivity,and then the unimportant convolution filters of each layer are pruned according to the score.This paper uses YOLOv3 and YOLOv3-tiny networks to conduct pruning experiments.The data sets are respectively 20 target classes and people of the VOC data set.The lightweight network has more regular structure higher accuracy.The YOLOv3 parameter compression is 80.4%,the inferencing time is 58% of the original,and the YOLOv3-tiny parameter compression is 92.5%,the inferencing time on the NVIDIA Jetson TX2 platform is 28% of the original.In this paper,the algorithm is also applied in practice,and the detection network pruning experiment is performed on the simple target category,the 6 types of drone categories,and the two types of drone categories.The detection ability of the lightweight model is not reduced when the speed is increased.The detection speed of the lightweight model of the two types of drones on the TX2 development board designed in the laboratory has been increased from 20 frames per second to 36 frames per second.Although convolution filter pruning can get a narrower network structure and speed up the inference speed,if the number of network layers is deeper,the data I/O between the convolutional layers will still consume a lot of time.This article hopes to prune convolutional layer to solve this problem.Aiming at the problem that the similarity of adjacent convolutional layers is difficult to measure,the paper proposes a layer pruning method based on the spatial similarity of feature maps.Since the difference between feature maps largely depends on the difference in spatial edge features,this paper calculates the minimum difference between the edge features of adjacent layer feature maps.The smaller the difference,the higher the similarity between adjacent layers,the more similar the extraction capacity is,and only one of the two layers can be retained during layer pruning.Since the two-layer residual structure of the YOLO series is not suitable for layer pruning,the research object of this paper is the YOLO-Fastest-xl network with Efficient Net-lite as the backbone.The article designs corresponding pruning methods for different convolutional layers in the residual block.According to the experimental results,the influence of grouping convolution on the inference speed is analyzed,and the advantages and disadvantages of different pruning methods are compared.The lightweight model obtained prunes 19 convolutional layers,and the parameter compression is 34.6%,the reasoning time is 80% of the original.Compared with the YOLO-Fastest network,which has a narrower network width than YOLO-Fastest-xl,the model inference time obtained by layer pruning is 82.4% of the latter,and the accuracy is 0.013 higher,which verifies the effectiveness of the algorithm.This article mainly solves the one-sided problem of the single parameter metric when the convolution filter is pruned,and the measurement of the similarity between layers when the convolutional layer is pruned.The model is compressed at two different granularities,and the forward inferencing is accelerated.This article has positive significance for the deployment of deep learning models.

Keywords/Search Tags:

Sensitivity, Integration, Spatial similarity, Layer pruning, Deployment

PDF Full Text Request

Related items

1	A Sensitivity-based Approach For Pruning Architecture Of Madalines
2	Research On Adaptive Soft Pruning Algorithm Based On Sensitivity Feedback
3	Research And Application Of Structured Model Compression Algorithm In Deep Neural Network
4	Research On C4.5 Algorithm Based On Cosine Similarity And Weighted Pruning Strategy
5	Research On Deep Neural Networks Compression Based On Sensitivity Pruning Method
6	Self-similarity sensitivity analysis in Web traffic
7	Study On The Calculation Methods Of Similarity For Spacial Scenes Consisted Of Vector Area Objects
8	Deep Model Pruning Method For Embedded Scene Recognition
9	Determination Of Spatial Sensitivity Of Electrostatic Sensors Based On COMSOL Simulation And Their Optimal Design
10	Research On Continuous Learning Based On Task Similarity And Network Pruning