Font Size: a A A

Research On Nested Named Entity Recognition Based On Knowledge Embedding And Boundary Enhancement

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiaoFull Text:PDF
GTID:2518306530498194Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named entity recognition(NER)plays a fundamental and key role in the field of natural language processing,but it also faces technical difficulties for a long time.The current research schemes still cannot effectively solve the OOV problem of entities,and most of them focus on the research of flat named entity recognition.The proposed sequence annotation model cannot be used to solve the nesting problem of entities.The advanced and powerful language model BERT has promoted the development of named entity recognition task,but the structure of BERT model can not distinguish the local context effectively,and there is no pre-training task for entity recognition.Therefore,in view of the OOV and nested structure problems of entities and the deficiencies of Bert model,a nested named entity recognition model based on knowledge embedding and boundary enhancement is proposed in this paper.This model is based on BERT,and corresponding improvements are made in its embedding layer,representation layer and output layer.1.The method of entity knowledge embedding.NER task is faced with the problem of frequent entity update and OOV entity emergence in practical application.More and more researches tend to rely on the knowledge base to solve the OOV problem,replacing the model iteration with the iteration of the knowledge base,so as to realize the rapid coverage of OOV entities.Therefore,this paper proposes a new knowledge embedding method to inject unsupervised entity information into the model during the fine-tuning phase.This method mainly uses the attention mechanism and the position embedding method in BERT model to realize the point-to-point knowledge embedding,and with the help of the input structure of the next sentence prediction(NSP)task in BERT,it realizes the effective fusion of entity knowledge and model.Based on Bert-base and Bert-small,the knowledge embedding method has improved effect on all the five common benchmark datasets.In addition,the F1 value of this method on OOV dataset increases by 4.33%,indicating that this method is a targeted solution to OOV problem and achieves good effect.2.Triangle exchange mechanism.In the BERT model,position embedding is used to obtain the position relationship between words,but this method does not depict the difference between the position above and the position below,which leads to inaccurate modeling.Therefore,this paper proposes a new method--Triangle exchange mechanism,which can enhance the feature representation of text by changing the self-attention mechanism structure of the model.The principle is to exchange the upper and lower triangles among the homologous attentional value matrices obtained from multiple heads according to certain rules,so as to enlarge the dynamic variation range of attentional value,and thus describe the difference between the positions above and below.Compared with the original Bert model,it is found that the proposed method can improve the convergence speed and accuracy of the pre-training task of the masked word language model,which proves that the proposed method can enhance the model's ability to represent the local context.Moreover,when applied to NER task,the experimental performance of the triangle exchange mechanism is superior to that of the model with native structure on each data set,bringing 0.5%?1.0%performance improvement for the model,which clearly reflects its advantages in NER task.3.A nested entity extraction method based on boundary detection and segment classification(B&S).For the extraction of nested entities,many methods are inefficient and ignore the boundary information of the entity.In this paper,nested entity extraction is divided into two sub-tasks: boundary prediction and span classification.The span classification divides the input sequence into sub-sequences of different intervals for classification,which can obviously speed up the efficiency of model prediction.However,the threshold value of the interval depends on the length of the entity in NER task,that is,the boundary of the entity needs to be obtained.Therefore,we combine boundary detection with span classification into one method,and the F1 values of this method on ACE2005 and GENIA datasets are 84.2% and 78.6% respectively,which improved by 13.2% and 7.2%respectively compared with the span based model FOFE.Moreover,based on BERT base,this method achieves the current optimal results in comparison of all baseline models,indicating that this method not only improves the model efficiency,but also enhances the accuracy of the model for entity recognition.Furthermore,the applicability experiment shows that this method is suitable for all kinds of NER tasks.In addition to the above individual verification experiments,the comprehensive application experiments show that the performance of the combination of BERT base+B&S+ knowledge embedding on the five benchmark data sets exceeds all the current baseline models,and the F1 value reaches 85.1% and 79.8% in ACE2005 and GENIA,which verifies the effectiveness of the proposed model for solving the OOV problem and the nested problem of entities.
Keywords/Search Tags:Named entity recognition, nested, BERT, Self-attention mechanism, boundary detection
PDF Full Text Request
Related items