Font Size: a A A

Research On Boundary-based Nested Named Entity Recognition Method

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:L F WuFull Text:PDF
GTID:2438330623484369Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition(NER)is a foundational research task in Natural Language Processing.The traditional technology adopts shallow sequence label model(such as Hidden Markov,Conditional Random Field)to output a label for each word,indicating the "begin","inside" and "outside" of entity(BIO tag).The sequence model usually outputs a label path with the highest probability by dynamic programming algorithm.Since traditional NER mainly processes independent sentence,the shortcoming of sparse context features is so prominent.In addition,outputting a label path with the highest probability cannot recognize nested named entity(nested NE).In order to recognize nested NE,most of related works employ cascading model: first generate candidate entities,and then classify them.The cascading model can only optimize each stage,and cannot obtain the global optimal.And it is easy to cause cascading failure.In order to solve the problem of sparse features and difficult recognition of nested entities,this paper proposes a deep boundary assemble model(NNBA).Based on the boundary assemble algorithm,NNBA constructes a BERT-based cascade model.First,the model uses sequence label algorithm based on Bi-LSTM-CRF to recognize the start and end boundaries of Nes.Then,candidate NEs are generated by boundary assemble.Finally,a Multi-LSTM model is used for candidate entity discrimination.Because BERT can use external data to automatically obtain sem-antic information and contextdependence,NNBA can effectively overcome the sparse feature problem faced by shallow learning models.The F1 value of ACE2005 Chinese data set reached 90.12%,which exceeded the comparison method by 17%.In view of the shortcomings of the cascading model,this paper proposes an endto-end boundary regression model(BR)based on the NNBA.BR draw on the experience of algorithm idea of Object Detection,and adopts linear sampling algorithm and boundary regression algorithm according to the characteristics of linear text sequence and NER.A multi-objective learning model based on neural network and boundary regression algorithm is constructed in an end-to-end manner.While predicting text border classification labels,it also predicts its position,which can make more effective use of the supervision information in the labeled data.The BR model performed well in the ACE2005 Chinese data set,and the F1 value reached 89.30%.
Keywords/Search Tags:Nested Named Entity Recognition, Boundary Assemble, Boundary Regression, Information Extraction, Deep Learning
PDF Full Text Request
Related items