| As an efficient model compression technique,knowledge distillation(KD)has received extensive attention in many fields of deep learning,such as computer vision,natural language processing,and speech recognition.The key of KD is to extract knowledge from a large and complex teacher network and transfer it to a small student network in the subsequent training process.However,most of the exiting knowledge distillation schemes only consider learning by extracting a certain type of knowledge from instance features or instance relations through specific distillation strategies,and do not consider the scheme of exploring different types of knowledge to transfer through different distillation strategies.This thesis mainly makes the following three aspects of work:(1)In order to solve the situation that complex large-scale teacher networks are difficult to train,a collaborative knowledge distillation via multi-knowledge transfer(CKD-MKT)is proposed.The CKD-MKT method does not need to train a large teacher network in advance.CKD-MKT utilizes a multi-knowledge transfer framework,which combines self-distillation with online distillation strategies,and guides each other through mutual collaboration and self-learning,effectively integrating different types of knowledge,enabling multiple student networks to learn from individual instances and instance relationships.The performance of the model is improved through mutual learning between student networks and self-learning of student networks.Experiments carried on five image data sets show that the proposed CKD-MKT method significantly outperforms state-of-the-art knowledge distillation methods.(2)Although the attention mechanism has a very positive effect on the improvement of model performance in the process of knowledge distillation.However,the information that deep neural network pays attention to in different intermediate layers is different.Specifically,the neural network focuses on the edge,position and other specific information of objects(such as eyes,nose,ect.)in the shallow layers.As the number of network layers deepens,the information it focuses on becomes more and more abstract(such as the entire face).Therefore,the hierarchical multi-attention transfer for knowledge distillation(HMAT)method proposes to use different attention knowledge on different layers of the neural network,so that the teacher network can transfer the different information concerned by each layer to the student network.In this way,the student network can acquire more comprehensive attentional knowledge from the teacher network.The HMAT method has verified its effectiveness on three different tasks: image classification,object detection,and image retrieval.(3)Existing relational knowledge distillation methods construct relations for a single sample or features generated by samples,and transfer the constructed relations as knowledge from the teacher network to the student network during the training process.Key regions in the feature maps cannot be distinguished in this way.Therefore,the attention-based sample correlations for knowledge distillation(ASCKD)method proposes to construct sample relations on the attention maps.ASCKD can establish correlations between samples by focusing on important sample regions.It can connect the regions that the model focuses on,and it can also well capture the relationship of key regions between any two samples.Therefore,in the process of knowledge distillation,the teacher network can deliver richer and more robust knowledge to the student network. |