Font Size: a A A

Conditional Random Fields Based English Name Entity Recognition

Posted on:2007-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2178360185485613Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity (NE), as the basic information unit of text, is essential to the correct understanding of a text. Named entity recognition (NER) is to identify the words in a document belonging to NEs and further classify them into some predefined categories. Named entity has been widely used in machine translation, text classification, information retrieval, automatic summarization or other Natural Language Processing applications. Accordingly, its solution will promote the research of the relevant fields. In this thesis,attention is concentrated on the English NER. NER is implemented with two methods, which are improved HMM and Conditional Random Field,and the test result is analyzed.First, this thesis identifies the English named entity with improved HMM. The result shows that its performance is better than the standard HMM, but it is not good at integrating the features like context information, semantic information and so on.Second, this thesis identifies the English entity names with Conditional Random Field approach integrating many features. By the analysis of feature selection, we find that the choice and implement of features are the key components to the system, its result is crucial to the system performance. So we not only made use of local features, but also made use of other occurrences of each word within the same document to extract useful features (global features). Furthermore, name lists are adopted to enhance the system performance. The test set for English NER is the formal test data from CoNLL-2003, and the F-measure achieves 84%.
Keywords/Search Tags:name entity recognition, hidden markov model, conditional random field, global feature
PDF Full Text Request
Related items