Font Size: a A A

Research On Chinese Named Entity Recognition Based On Knowledge Distillation

Posted on:2024-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhaoFull Text:PDF
GTID:2568307058972099Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Supervised named entity recognition models can achieve excellent performance on annotated training sets with correct and sufficient annotations.However,in reality,named entity recognition annotation data is scarce,and data augmentation methods can effectively solve this problem.However,existing data augmentation methods are prone to introducing noise,leading to poor augmented data quality.Additionally,large pre-trained language models(such as BERT)perform very well in named entity recognition tasks,but the problems of a large number of parameters and high memory requirements are also highlighted.Knowledge distillation is one effective method to address the above issue,but the traditional knowledge distillation loss function may result in poor distillation performance due to its internal coupling relationships.To address the aforementioned issues,this thesis explores research from three different perspectives: improving data augmentation methods,removing the internal coupling relationship of the traditional knowledge distillation loss function,and designing knowledge distillation models.The main contents are as follows.(1)A method combining k-best Viterbi decoding with Decoupling Knowledge Distillation(kvDKD)and an Improved Data Augmentation method(IDA)are proposed.Based on these methods,a new Named Entity Recognition model(NER-kvDKD)is also proposed.(a)The IDA method uses data filtering and entity rebalancing algorithms to effectively reduce the problems of noise introduced by the original dataset and errors in annotation produced by data augmentation,improve dataset quality,and reduce overfitting;(b)the kvDKD method combines k-best Viterbi decoding to propose a decoupling knowledge distillation method,which aims to remove the coupling relationship within the traditional knowledge distillation loss function.Experimental results on the four datasets showed that the student model obtained by NER-kvDKD improved the F1-score compared to the baseline model by0.37%~12.8%,and also effectively improved the generalization ability of the model.(2)Choosing an appropriate student network structure is key to the effectiveness of knowledge distillation.To this end,the NER model of k-best Viterbi decoupled knowledge distillation for different student models is proposed based on NER-kvDKD.This model utilizes IDCNN-MCRF and BiGRU-MCRF,which are widely used in named entity recognition tasks,structurally simple,and similar to the teacher model(BERT-MCRF)structure,as the student model network structures respectively.Finally,based on the above two student network structures,two new knowledge distillation models,NER1-kvDKD and NER2-kvDKD,are proposed,aiming to solve the problem that the student network structure of NER-kvDKD is too homogeneous,so as to obtain a lighter model of knowledge distillation with better results.Experimental results on the four datasets show that the lightweight model obtained using NER2-kvDKD has an optimal number of parameters,prediction time,and server response time,and reduces the number of parameters by 98.3% compared to the teacher model,with an increase of 59.4% in prediction time.
Keywords/Search Tags:Named entity recognition, Data augmentation, Knowledge distillation, K-best viterbi decode, IDCNN-MCRF, BiGRU-MCRF
PDF Full Text Request
Related items