Font Size: a A A

Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiuFull Text:PDF
GTID:2518306470469034Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The continuous upgrading of the depth model's demand for computing and storage resources has gradually become a constraint that restricts the depth model to land on resource-constrained devices,and thus the wave of academic and industry research on the compression and optimization of deep model was triggered.At the beginning of the study,the researchers carried out their research from different perspectives,such as low-rank decomposition,weight pruning,weight quantization,knowledge distillation,and creating compact network achieved remarkable results.With the deepening of the research,researchers gradually realize that there is a certain degree of fusion between the previous research schemes,which can achieve better compression effect by "highlighting the advantages and avoiding the disadvantages".In the existing model compression methods,the weight pruning can achieve the effect of model sparsity by "reducing" the redundant parameters or convolution kernel.The knowledge distillation takes "teacher network" as the "guide",continuously "increases" "student network" scale,thus obtains the student network precision enhancement.Among them,one "decrease" and one "increase" seem to have different directions,but they can reach the same compression optimization effect.Combined with the two,the optimization effect of the model can be improved rapidly from "bidirectional".Different from the above two compression methods,weight quantization is realized by reducing the number of bits required to represent the weight.As it is directly related to the limitation of hardware resources to be deployed,almost all algorithms need to be combined with weight quantization to be deployed on various hardware platforms.In view of the above ideas,this paper combines the advantages of weight pruning,weight quantization and knowledge distillation to achieve a more efficient model compression framework,and applies it to the classical model YOLO network compression of target detection,so that it can be deployed in embedded terminal devices.The main work of this paper includes the following two aspects:1.Proposed the "compression optimization framework of the deep neural network model based on the combination of weight pruning,weight quantization and knowledge distillation",and realized the effect of multi-angle compression of the model.The framework use pruning and weight quantization to compress the original network model with coarse granularity to obtain the "student network",and then takes the original network as the "teacher network" to "guide" the compressed "student network" for training,so as to improve the accuracy of "student network".Experiments show that compared with the model without compression framework,the proposed scheme can effectively reduce the computation amount of the network while ensuring the accuracy on VGG-16 and the customized classification network.2.The application of the above fusion framework in the "YOLO based apple defect detection model" solves the problem that the YOLO model is too large to be transplanted to embedded devices.First,the YOLO framework was used to identify and detect apple's defects,then the deep model compression technology was applied to YOLO's backbone network,and the optimized detection network was transplanted to jetson tx2 devices for operation.Experiments have proved that the compressed YOLO model can be deployed on jetson tx2,and the processing frame rate can be significantly improved on the premise of losing a little accuracy(without affecting the judgment result).
Keywords/Search Tags:Model compression, Weight pruning, Knowledge distillation, Deep learning, YOLO
PDF Full Text Request
Related items