| With the fast-paced development of Internet technology in recent years,various application platforms have emerged and users can freely express their ideas on these platforms.Texts with subjective emotions have become an essential part of the massive data on the Internet.Therefore,sentiment analysis has been paid attention and it can be extensively applied in many scenarios,such as public opinion monitoring,personalized recommendation,question answering,etc.The sentiment classification task which aims to determine the sentiment polarity of a given text,is considered a foundational task in the field of sentiment analysis.However,the mainstream pre-trained language models suffer from plenty of parameters and the slow inference speed.The common solution is to apply knowledge distillation methods.This paper focuses on the defects of knowledge distillation methods in the sentiment classification task and the specific research contents are as follows:(1)For the problem that the learning ability of a one-student model in the traditional distillation structure is limited by the representation ability of the student model and the number of training samples,this thesis introduces a multiple-student structure based on the one-to-one distillation model structure.This method uses different inputs to make each student model learn different knowledge and features from the teacher model in the distillation process.In addition,since the labelling of samples requires a significant amount of manpower and resources,this method also introduces a lot of unlabeled samples in the distillation process.Experimental results demonstrate that the one-teacher and multiple-student knowledge distillation method can perform better than mainstream distillation models in sentiment classification tasks.(2)A one-teacher model in the traditional knowledge distillation structure contains limited knowledge.Different pre-trained language models are usually trained from different unsupervised corpora or pre-training tasks and they usually have different knowledge sources and representation capabilities.In order to make full use of the knowledge of different teacher models,this thesis proposes a multiple-teacher structure and leverages unlabeled samples to distil the knowledge of different teacher models into the corresponding student models.Furthermore,the final prediction is weighted by combining the outputs of multiple student models.Experimental results indicate that the proposed method can further augment the performance of the one-teacher and one-student model and maintain the advantages of a small number of parameters and fast inference speed.(3)The traditional knowledge distillation structure has been less explored in few-shot sentiment classification tasks.Pre-trained language models used in few-shot learning have a large number of parameters.Thus,the computational costs of using these models as the teacher model during knowledge distillation are high.To tackle the challenge,this thesis proposes a novel approach with both senior and junior teacher models.Specifically,the junior teacher model predicts the unlabeled samples and calculates their uncertainties of probabilities.Then,the threshold mechanism determines whether the senior teacher model is necessary to obtain new probabilities.The results of the senior and junior teacher models are combined as the final soft labels to distil a shallow student model.Experimental results show that the proposed method slightly approaches the senior teacher distillation model,along with substantial reductions in resource consumption and the access rate of the senior teacher.In short,this paper proposes a series of ensemble solutions to address the challenges faced by knowledge distillation approaches in the sentiment classification task.Experimental results show that the three methods in this paper have achieved significant performances and efficiency improvements compared with the baseline models. |