Font Size: a A A

Adaptive Multi-Teacher Multi-Student Knowledge Distillation Learning

Posted on:2020-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:J L SongFull Text:PDF
GTID:2417330596968174Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Mimicking the real scenario of the human teaching procedure,knowledge distillation is proposed to guide the model both with the supervision from ground-truth labels and prediction from other models.The guidance of ground-truth labels is equivalent to that the student learn experience through solving problems and checking the answer.The help from the other model is equivelant to that the teacher transforming his knowledge to the student.There are two main directions for applications of this general teacher-student learning paradigm.One is using a strong teacher model with a large amount of paramters to help the compact and weak student model to improve its performance.The other is utilizing the teacher model to transfer the extra information to the student modelHowever,most prior related studies of distillation learning only assume a single teacher in their distillation learning methods,neglecting the fact that a student can learn from multiple teachers simultaneously,or simply treat each teacher to be equally impor-tant,unable to reveal the different importance of teachers.To bridge this gap,we propose a concise and effective adaptive learning framework which can determine the importance of different teacher models for specific data examples,and further fuse their knowledge to benefit the learning of the student model.Furthermore,we introduce mutual learning of multiple students to extend the adaptive learning framework for the novel multi-teacher multi-student knowledge distillationOur proposed frameworks are applied to the tasks of cold-start document-level sen-timent classification and image classification.The experiments on public benchmark datasets and real-world datasets show knowledge distillation is effective for the studied problem and the our proposed approaches gains consistent improvements.
Keywords/Search Tags:Deep Learning, Knowledge Distillation, Sentiment Classfication, Image Classification
PDF Full Text Request
Related items