Font Size: a A A

Research And Application Of Identification Method Of Government Official Document Named Entity

Posted on:2020-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X H WuFull Text:PDF
GTID:2416330602961439Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The exponential growth of administrative documents motivates the development of algorithms to process,annotate and recognize key entities within the text.Such algorithms automatically analyze attributes such as personnel,team,organization,rank and responsibility.The annotation and recognition of named entity is of great importance for its fundamental role in systematic analysis and effective management.The named entity recognition refers to the task to extract words or short texts with special meaning,including names,places,etc.The state-of-the-art algorithms include Markov random field optimization framework and neural networks based approaches.This paper first studies a Conditional Random Field(CRF)approach thoroughly.Then,this paper introduces a novel algorithm based on Bi-directional Long Short Term Memory(Bi-LSTM).With extensive experiments,this paper demonstrates that the Bi-LSTM approach outperforms the CRF approach in administrative documents annotation task.The contribution of this paper is to propose an ensemble based method which combines the CRF and Bi-LSTM.The ensemble method(Bi-LSTM-CRF)is pre-trained with dataset from People's Daily,and then fine-tuned on our own labeled administrative documents.We modify the text embedding,improve the Bi-LSTM's activation function,apply Word2Vec,and adjust hyperparameters in the model to achieve higher F coefficient.We also perform extensive experiments to demonstrate the value of the proposed method in real world applications.
Keywords/Search Tags:Administrative document, Named entity recognition, CRF, Bi-LSTM-CRF
PDF Full Text Request
Related items