Font Size: a A A

Research On Coreference Resolution Based On The Maximum Entropy Model

Posted on:2008-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:N PangFull Text:PDF
GTID:2178360242469504Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the information increasing explosively and the techniques of dealing with the discourses applying widely, anaphora resolution shows the unprecedented importance and attracts attention of researchers. Coreference resolution is very important subtask of anaphora resolution and has quite widely practicality value and society value.Coreference is a common phenomenon in the news report about paroxysmal event, it appears a lot in the discourses or the dialogues. The use of coreference makes reports looks brief. Coreference resolution is necessary for information extraction. As one of the focuses in natural language processing (NLP), coreference resolution combines many NLP techniques such as part-of-speech tagging, noun phrase recognizing and so on, as well as serves as an important component in several NLP applications including text Information Extraction and Question Answer systems, where coreference resolution inevitably is applied.In the paper, based on the in-depth analysis of coreference features in the paroxysmal Chinese text, we introduce an approach of coreference resolution, which is based on corpus adopting the statistical machine learning arithmetic, by applying the Maximum Entropy Model, we exploringly introduce an approach of coreference resolution in a Chinese news report about paroxysmal event. The approach may extract pronoun, noun and noun phrase that point to the same object in a news report.The features of this model are shown as follow:1 Machine automatic learning. After training on sample corpus, the model can product a feature set, while traditional approaches construct such feature set manually.2 Good expansibility. On a basis of practice, increasing or decreasing knowledge in term of the related field may the system expediently transplant.3 Good robusticity. Because NLP is not perfect now and all of the features are gained from NLP, mistake is ineluctable. But the experiments give proof on robusticity of the arithmeticThis paper makes a preparatory research on coreference phenomennon of the news report about paroxysmal event, describes learning and realization of coreference resolution model based on the Maximum Entropy and evaluates the arithmetic in the round. We have tagged the corpus contains 200,000 Chinese characters which is used to train and test, experiments show that the F rate respectively reach 59.98% and 64.6% on the open test and the close test. The experiments result indicates that this model is effectual to resolve the coreference phenomenon in the news report about paroxysmal event. It obtains the desired result, aiming at personal pronoun and alias coreference resolution.This paper analyzes main error types that affect coreference resolution model, these error types include part-of-speech tagging error, noun phrase recognizing error and feature property assigning error. Also the paper demonstrates research direction father and establish the foundation of research for the future, namely introducing syntax feature to resolve coreference; combining ACE to measure model.
Keywords/Search Tags:Maximum Entropy, Coreference resolution, Corpus, Natural Language Processing
PDF Full Text Request
Related items