Font Size: a A A

Named Entity Recognition Based On Conditional Random Fields Chinese Research

Posted on:2007-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WangFull Text:PDF
GTID:2208360185991665Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the taxonomy of computational linguistics tasks, Named entity recognition falls under the domain of "information extraction".The task has particular significance for information retrieval, machine translation, the automatic indexing of documents, and data mining, etc. This thesis will make the use of a statistical model as Conditional Random Fields (CRF) to do the research on the Chinese named entity in order to recognize the person-name, location, organization in the documents.1 This thesis describes CRF models in detail. Compared with other models used in the sequencial labeling problems we describe the main characteristic of this new rising model.2 We introduce mutual information to obtain external statistical lexicons from the existing corpus resource. Using these lexicons, we introduce external features into the training process. The experiment results show that the introcducing of the external features can reduce the need of training data and accordingly improve the effect of the entity recognition remarkably.3 We introduce a certainty-based active learning training strategy in the training of the organizations. The experiments show that performance of the recognition can be elevated and the redundancy can be reduced when training with the same account of labeled samples.4 Regarding CRF as the basic model, we design and construct an experiment system to recognize Chinese person-names including foreign names and locations in character level and orgnizations in word level. The experimental system should have good expansibility.
Keywords/Search Tags:Named Entity Recognition, Conditional Random Field, Feature, Statistical Lexicon, Active Learning
PDF Full Text Request
Related items