With the rapid development of deep learning,the lightweight of models has become an urgent requirement for deployment in practical scenarios.As a typical algorithm for model compression,knowledge distillation is an important topic in the field of deep learning,and has received extensive attention and development in academia and industry in recent years.Most of the current mainstream knowledge distillation algorithms are based on feature and relationship distillation,and still face several difficulties and challenges:one is that it is too sensitive to the training hyper-parameters,and different hyper-parameter settings may lead to large differences in results;the second is The teacher model and the student model must be matched.An inappropriate teacher model may cause the training of the student model to be unstable or even not improved.The third is that the inherent information of the data and the characteristics of the middle layer are not fully utilized,resulting in the generalization of the student model.Poor ability and anti-interference ability.Therefore,there is still room for continued research and exploration in solving the problems of the efficiency and universality of knowledge distillation.In order to solve the above three problems,this paper presents solutions.This paper proposes an efficient knowledge distillation algorithm based on contrastive learning and attention mechanism,and designs an adaptive weight update mechanism with important practical value.By introducing spatial attention and channel attention mechanisms,the feature distillation module can efficiently extract the most meaningful knowledge from rich features,while designing a task-oriented loss function to improve the distillation effect in a data-driven manner.This graduation takes comparative learning as an auxiliary task,designs a comparative learning branch of the projection head structure,selectively transmits self-supervised information to the student model,allows the student model to learn the structured information of the data,and improves the generalization ability of the student model and anti-interference ability.At the same time,this graduate modeled multiple branches as a multi-task learning problem,judged the learning situation of the task through the generalization ability of the validation set,and designed an adaptive mechanism to dynamically update the weight of the loss function.This paper also proposes a novel method for the initialization strategy of the student model,in order to fundamentally solve the gap between the two models.Finally,by designing reasonable and scientific experiments,this paper demonstrates the high accuracy of this algorithm on multiple authoritative classification data sets,and also verifies the effectiveness of each functional module. |