Breast cancer has become the largest cancer worldwide and the most common cancer in the female population.At present,the diagnosis of breast cancer is mainly made by histopathological analysis,of which the most critical step is biopsy.However,with the increasing number of breast cancer patients,it is not possible to meet patient needs by relying solely on manual diagnosis.In addition,subjective deviation,exertion and other influencing factors of different doctors often lead to inefficiency or even misjudgment.The ability of deep learning technology to accurately process and analyze a large number of image data in a short time and apply it to medical imaging detection can greatly reduce the burden on doctors.However,the size of deep learning networks is usually large,and deployment can occupy a large amount of computational resources.Therefore,the knowledge of large-scale deep learning networks is considered to be migrated onto relatively small models,thus reducing the computational resources required upon deployment.The main work of this study is to improve the knowledge transfer ability from large-scale model to smallscale model by combining multi-teacher knowledge distillation technology and twolevel knowledge distillation strategy.Specific study contents include the following:(1)A few-shot knowledge distillation method for the classification of breast cancer pathological images is studied.A progressive network grafting method is used to realize knowledge distillation in few shot environment.In the first step,the student blocks were grafted one by one onto the teacher network and intertwined with other teacher blocks for training,and only the parameters of the student blocks were updated during the training process.In the second step,the trained student blocks are grafted onto the teacher network in turn,so that the learned student blocks adapt to each other,and finally completely replace the teacher network to obtain a more lightweight network structure.In the process of knowledge distillation,the consumption of training resources when the teacher network transmits knowledge to the student network is greatly reduced due to the use of few shot data for training.At the same time,the performance of the student network is also improved compared with the original student network,and the decision performance can be almost equivalent to that of the teacher network.(2)This study proposes a simple and effective soft target integration method for multi-teacher networks.Different teacher models are controlled by weight to transmit the influence of knowledge in knowledge distillation,so as to ensure that the teacher model with excellent decision-making effect plays a more significant role in guiding the student network in the process of knowledge distillation.The experimental results on the Brea KHis data set show that the soft label integration method through the proposed multi-teacher network is more superior to performing small-sample knowledge distillation.Moreover,in the few-shot dual-stage progressive knowledge distillation strategy combining multi-teacher knowledge distillation,the increase of the number of teacher networks can promote students to learn the knowledge of teacher networks on the network,and even students’ networks obtain better performance than all teacher networks.At the same time,the network structure of the student network is lighter than that of the teacher network,that is,the number of channels of the student network is much less than that of the teacher network,so a more lighter student network is obtained while the classification accuracy of the student network exceeds the classification accuracy of all teacher networks.(3)A multi-teacher knowledge distillation method combining the idea of attention mechanism is proposed.In this paper,a multi-teacher soft target "noise reduction" method based on the idea of attention mechanism is proposed in multi-teacher knowledge distillation in order to improve the knowledge transfer efficiency of teacher model in multi-teacher knowledge distillation.The noise impact of multiple teacher models in knowledge distillation is reduced using the proposed "noise reduction" module.The proposed attention "noise reduction" module is more suitable for the scenario of multiple teacher models in multi-teacher small-sample knowledge distillation. |