Research On Efficient Knowledge Distillation Methods

Posted on:2022-11-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Pei

Full Text:PDF

GTID:2568306326476794

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Knowledge Distillation(KD)is a simple but effective method for model compression,which uses the knowledge from a well-trained teacher network(a large neural network)to assist in training the student network(a small neural network),thereby effectively improving the performance of the student network.Although knowledge distillation has made significant progress,there are still many difficulties:(1)Generally,if there is a large gap in capacity between the teacher network and the student network,the student network(Student)cannot effectively learn the knowledge from the teacher network(Teacher),leading to poor classification performance of Student;(2)There are a lot of related work on different types of Teacher’s knowledge,but what type of Teacher’s knowledge is more conducive to Student’s learning is still a difficult point;(3)The misinformation in Teacher may be harmful to the learning of Student;(4)Online distillation can cause the serious homogeneity which will hinder the further improvement of Students’ performance.In order to solve the above-mentioned issues,we focus on efficient knowledge distillation methods for image classification.The main contributions are as follows:(1)Self-boosting Feature Distillation is proposed.This method addresses the problem of the Teacher-Student gap from a new perspective,that is,Student’s own information is used to improve Student’s learning ability,thereby alleviating the gap between Teacher and Student.Integrated features are conducted to imitate Teacher’s original features.And a new self-distillation strategy is proposed,which only uses the parameters of Student’s previous epoch to update Student’s parameters without increasing memory usage and forward propagation.Moreover,the effectiveness of self-boosting feature distillation is explained by Richardson extrapolation,and self-boosting feature distillation improves the convergence order of Student.Extensive experiments show that the proposed method has excellent distillation performance,which is significantly better than current state-of-theart knowledge distillation methods.(2)We propose an online distillation method based on contrastive learning.In view of the serious homogeneity among Students in online distillation,the similarity between samples is treated as the knowledge for Students to learn from each other to mitigate homogeneity.Moreover,a new loss function is designed to appropriately improve the difference among Students.In addition,we use the ensemble of multiple Students’ outputs as the Teacher and the rest of the network as the leader,and an additional self-distillation loss is conducted on the leader network to alleviate the Teacher-Student gap.The experimental results show that our method effectively improves the performance of the leader and the integrated Teacher.(3)We propose an offline distillation method based on parameter-free loss estimation.In view of the difficulties in optimizing the classical distillation loss and determining temperature hyperparameters,four new parameter-free losses based on information normalization are designed,which greatly improves the distillation effect.Besides,considering that knowledge distillation is essentially a function fitting problem on discrete data points,in order to improve the density and richness of the data,we propose an intra-class neighborhood sampling strategy to enable Student to capture richer knowledge from Teacher.Experiments on multiple datasets show that the proposed method greatly improves the performance of Student.

Keywords/Search Tags:

Self-boosting, Feature Distillation, Online Distillation, Offline Distillation

PDF Full Text Request

Related items

1	Research On Lightweight Traffic Classification Based On Knowledge Distillation
2	Research On Compression Method Of Multimodal Pretrained Model Based On Knowledge Distillation
3	Research On Extractive Distillation Control System Based On PCS7
4	Study On Robust Distillation And Pruning Methods For Defending Against Adversarial Examples
5	Distillation Improvement And Interpretability Enhancement Of Few-shot Recognition
6	Ontology-Based Dialogue State Tracking And Its Knowledge Distillation Method
7	Research Of Knowledge Distillation Based On Multiple Feature Matching Mechanism
8	Research On Stage-by-Stage Knowledge Distillation And Assistant Model Based Knowledge Distillation
9	Research On Feature Interaction And Knowledge Distillation Based CTR Prediction Models
10	Research On Object Detection Algorithm Based On Knowledge Distillation