Font Size: a A A

Research And Implementation Of Chinese Named Entity Recognition Based On Lattice-LSTM Model

Posted on:2022-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:C M HuangFull Text:PDF
GTID:2518306575966659Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is a branch of information filtering,which belongs to the direction of information extraction in natural language processing.It has applications in massive information filtering,knowledge mapping and question answering system.Named entity recognition mainly focuses on the entity information represented by person's name and place name.Although it has a long development,it still faces many difficulties,such as the loss of information in long sentences and ignoring the global information of sentences.Therefore,this thesis studies the above problems,and the main work includes the following aspects:1.In this thesis,Bi-LSTM-CRF,Bi-LSTM-CNNs-CRF,Lattice-LSTM and other models are deeply studied,and some models are reproduced.to understand the model processing process.It is found that most models are based on LSTM structure,which has the defect of missing text information in long sentences.The next,entities are considered only from the perspective of character information,but neglecting to consider from the overall point of view,using sentence structure information for entity judgment.2.This thesis proposes an improved model LLPA,which uses Lattice-LSTM model to extract sentence text information,and Rel?Pos?Attention model to extracts sentence structure information,and change the calculation method of attention weight and choose the location coding.Through experiments on 4 datasets,the results show that the improved model LLPA has stronger recognition ability for entities,and the comprehensive evaluation index1 is improved greatly,reaching about 2%.3.Two extended experiments are carried out in this thesis,one is to improve the recognition effect of the model by replacing word vector,the other is to verify the robustness of the model by using two different domain corpora.Firstly,we use the Glove method and ELMo method to train the word vector,and apply it to the experiment.It is found that the better word vector can further improve the recognition effect;Secondly,in addition to the recognized datasets,this thesis obtains two different domain datasets on the network,and carries out experiments.The experimental results show that the improved model has a certain robustness,and can identify the corresponding entities in different domain datasets.4.Finally,based on the above improved model,a Chinese named entity recognition system is designed and implemented.This system has two main functions.On the one hand,it is to recognize the text of multiple sentences and display the recognition results;On the other hand,It is to recognize the text saved in electronic documents in batches,and store the recognition results of electronic documents in the corresponding output documents,so as to facilitate users to observe and use the recognition results.
Keywords/Search Tags:Chinese named entity recognition, Lattice-LSTM model, attention weight, relative position coding
PDF Full Text Request
Related items