Font Size: a A A

Research Of Key Issues In Coreference Resolution

Posted on:2010-05-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:F KongFull Text:PDF
GTID:1118360278978093Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and computer technologies, various kinds of information have been increasing explosively and the demand for precisely located information gives a strong impetus to the NLP research. As an important and hot research topic in NLP, coreference resolution plays a critical role in many NLP applications, such as text summarization, machine translation, information extraction and multi-language information processing. Meanwhile, coreference resolution depends heavily on diverse NLP techniques, including part-of-speech tagging, named entity recognition, syntactic parsing and semantic analysis, and is thus challenging.As the basis of this research, a state-of-the-art coreference resolution platform is first built. Then, some key issues in coreference resolution are addressed entensively from both syntactic and semantic perspectives. The contribution of this research lies in:1. Centering theory-based coreference resolution, with focus on how to extend the Centering Theory from the grammartical layer to the semantic layer via semantic roles. In addition, three related sets of features, i.e. semantic role features, pronominal ranking features and pronominal subcategory features, are employed to explore the impact of the Centering Theory-driven features in coreference resolution. Experimentation on the ACE 2003 English corpus shows that such features significantly improve the performance of coreference resolution, particularly for pronoun resolution. Meanwhile, it also shows that such features benefit both short-range coreference resolution and long-range coreference resolution.2. Tree kernel-based coreference resolution, with focus on exploring various tree structures to capture structural information in parse trees, including a) inclusion of syntactic descriptive information around antecedent candidates, motivated by the Centering Theory; b) inclusion of competitors(of the antecedent candidate)-related information; c) inclsion of semantic information, such as semantic roles and pronominal subcategories, in the tree structures. Experimentation on the ACE 2004 English corpus shows that the tree kernel-based method significantly improves the performance of coreference resolution in English, especially for pronoun resolution within a single sentence. In addtion, experimentation on the ACE 2005 Chinese corpus shows that the tree kernel-based method also yields great improvement on the performance of coreference resolution in Chinese. This suggests that structural information plays an important role in coreference resolution and such role is language-independent, at least for English and Chinese.3. Anaphoricity determination in coreference resolution. Various kinds of methods, i.e. rule-based, feature-based, and tree kernel-based, are first explored in learning noun phrase anaphoricity, which are then applied to coreference resolution systematically. Experimentation on both the ACE 2004 English corpus and the ACE 2005 Chinese corpus shows that proper anaphoricity determination can significantly improve the performance of coreference resolution in both English and Chinese.
Keywords/Search Tags:Coreference Resolution, Centering Theory, Semantic Role, Tree Kernel Method, Anaphoricity Determination
PDF Full Text Request
Related items