| In the recent years,deep convolutional neural networks have achieved great success in the field of image classification,object detection,semantic segmentation,etc.However,the advantage of CNN is accompanied with deep model structure,which requires extensive computing resources and memory cost,hindering it from being applied to real production.To this end,it becomes crucial to explore a way of reducing the model size but barely sacrificing the performance.In this paper,we work on a model acceleration technique called knowledge distillation and proposed two methods to improve its performance.The proposed methods achieved state-of-the-art results.The key idea for knowledge distillation is to transfer the knowledge from a deep teacher model to a shallower student model.Benefit from the transferred knowledge,the performance of student can be improved and become close to teacher.If the performance of student become exactly as the teacher’s,we can consider that teacher has been compressed into a light weight student model.In this paper,we claimed that it is important to transfer the feature knowledge at down-sampling point in a network.Meanwhile,we proposed to decompose the transfer process into two steps:backbone learning and task-head fine-tuning.Then,a stage-by-stage knowledge distillation will be applied,which facilitates progressive feature learning from teacher to student.Considering there still have gap between student and teacher network,we introduce an assistant model to reduce this gap.Specifically,student is trained to mimic the hidden feature maps of teacher,and assistant aids this process by learning the residual error between them.In this way,student and assistant complement with each other to get better knowledge from teacher. |