BERT To Bi-LSTM Knowledge Distillation For Sentiment Classification

Posted on:2023-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Liu

Full Text:PDF

GTID:2568306770959789

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the current research environment,complex models usually perform better on datasets than lighter models.Therefore,many researchers spend a lot of energy on how to design more complex and exquisite models.However,such research often encounters the problem of resource constraints when applied.After all,although we can use computing resources as much as possible when training the model,the computing resources that can be called to the program are still limited in the actual application environment.Therefore,in order to make the research model effective in the actual environment,various model compression technologies have been developed,which can reduce the computational resources occupied by the model without losing the accuracy of the model.As a research hotspot in model compression,knowledge distillation solves the above problems well.It not only greatly compresses the size of the model,but also minimizes the performance loss caused by model compression.The whole compression process is completed only through one model distillation training,which is more simple than other methods.Generally speaking,on the same dataset,with the current model learning method,it is difficult for the small model to achieve the same effect as the large model.However,knowledge distillation breaks this shackle.It borrows the idea of teachers teaching students and uses the large model as the teacher model to guide the small model learning,so as to make the small model reach a new height.The performance improvement of the small model after knowledge distillation is mainly related to the performance of the teacher model and the way of knowledge distillation.Therefore,it is necessary to select the teacher model with excellent performance before distillation training.The main research scene of this paper is in the field of Sentiment classification,so the teacher model used for distillation is naturally BERT model that shows its strength in many NLP tasks.At present,there are two distillation ideas about the knowledge distillation of BERT model: the first is to distill BERT into a smaller BERT model,in which the student model still maintains the structure of Transformer;The other idea,which is also the choice of this paper,is to distill the BERT model into heterogeneous models such as Bi-LSTM model.Although the performance of the student model is not as good as the former,in some environments with extremely limited resources,if the accuracy of the distilled Bi-LSTM model can meet the application requirements,the Bi-LSTM model with less parameters is obviously more practical.The main contribution of this paper is that under the existing BERT model distillation schemes,combined with the ideas of traditional knowledge distillation schemes such as Factor Transfer and Similarity-Preserving Knowledge Distillation,this paper puts forward two experimental schemes to distill the BERT model into Bi-LSTM model: BERT to Bi-LSTM with FT and BERT to Bi-LSTM with SPKD,which are tested in SST-2 and IMDB datasets in the field of sentiment classification,It is compared with Tiny BERT scheme and distilled Bi-LSTM scheme proposed by Tang.The final experimental results show that BERT to Bi-LSTM with SPKD performs better than other BERT to Bi-LSTM distillation schemes on both SST-2 and IMDB datasets,and only slightly worse than Tiny BERT scheme when the parameter is much less than Tiny BERT.Such experimental results well verify the importance of inter sample information in the distillation process.At present,the distillation scheme of BERT model just lacks the use of such information.Only focusing on the distillation of a single sample limits the performance improvement of student model.Subsequent research can pay more attention to the mining of inter sample information.Finally,the code involved in this paper have been put into Git Hub.The specific links are: https://github.com/bestahao/knowledge_distillnation.

Keywords/Search Tags:

Model Compression, Knowledge Distillation, Sentiment Classification

PDF Full Text Request

Related items

1	Research On Sentiment Classification With Ensembled Knowledge Distillation
2	Design And Implementation Of Compression And Acceleration System Of Lip Reading Model Based On Knowledge Distillation
3	Research On Stage-by-Stage Knowledge Distillation And Assistant Model Based Knowledge Distillation
4	Research And Application Of Model Compression Algorithm Based On Pruning-quantization-knowledge Distillation
5	Design And Implementation Of Hybrid Model Compression For Image Classification
6	Research On Compression Method Of Multimodal Pretrained Model Based On Knowledge Distillation
7	Research And Implementation Of Compression Algorithm For Multilevel Knowledge Distillation Model Based On Feature Map
8	Research And Implementation Of Model Compression Method Based On Knowledge Distillation
9	Research On Model Distillation Method Based On Channel Knowledge
10	Research On Text Sentiment Classification Technology Based On Deep Learning