Font Size: a A A

Research And Implementation Of Chinese Cross-Document Coreference Resolution

Posted on:2011-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:C S LuFull Text:PDF
GTID:2178360305976427Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cross-Document coreference resolution is an important topic in natural language processing, it is a key component of application systems such as IE (Information Extraction), IR (Information Retrieval), multi-document summarization, ect. In the past decade, such research focused mainly on coreference resolution in a single test. With the progress of technology, cross-document coreference resolution caused increasing concern, because it constructs many coreference chains between documents, so more information on entities can be got from the texts. Besides, some useful information obtained from cross-document coreference resolution can be feedback to coreference resolution. In this way, coreference resolution could make a breakthrough.As the study of Chinese cross-document coreference resolution is still in its infancy, this article gives a detail introduction about research on English cross-document coreference resolution. Through referencing papers on English coreference cross-document resolution, a platform of Chinese cross-document coreference resolution is designed. The research on it, actually, contains two parts: Chinese cross-document coreference resolution on personal name and on place name. We present a two-step approach to Chinese cross-document coreference resolution on personal name: Firstly, biographical information and compatibility information is extracted to form an initial set of coreference chains. Secondly, a clustering algorithm based Vector Space Model (VSM) is applied to merge chains. We also propose a method of combining the extraction of document-level information with the clustering algorithm based VSM to realize Chinese cross-document coreference resolution on place name. In addition, because of lacking of Chinese cross-document coreference resolution corpus, we collect 113 personal name documents (these documents have the same name"ZhangWei"and 30 place name documents (each document has the same place name"TongZhou") from the search engines and these documents are under pretreatment, manual proofreading and checking. We use B-CUBED algorithm to evaluate our system, on"ZhangWei"corpus, we get a F measure of 95.71%, the corresponding precision and recall are 92.41% and 99.25%, while on"TongZhou"corpus , we get a F measure of 89.30%, the corresponding precision and recall are 100% and 80.66%.Particularly, we present in this paper that different features and different combinations of features may have an effect on the platform, different calculation methods of similarity, different threshold intervals, different biographical information, compatibility information and document-level information also affect the performance of the system. At the same time, we discuss the relationship between Chinese coreference resolution and Chinese cross-document coreference resolution. By comparing the results of the experiments, we give a solution to avoid some errors in Chinese cross-document coreference resolution. Experimental results show our approach holds promise and achieves a good performance.
Keywords/Search Tags:Coreference Resolution, Chinese Cross-Document Coreference Resolution, Biographical Information, Compatibility Information, Document-level Information, Vector Space Model, B-CUBED algorithm
PDF Full Text Request
Related items