Research And Implementation Of Chinese Cross-Document Coreference Resolution

Posted on:2011-10-15

Degree:Master

Type:Thesis

Country:China

Candidate:C S Lu

Full Text:PDF

GTID:2178360305976427

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Cross-Document coreference resolution is an important topic in natural language processing, it is a key component of application systems such as IE (Information Extraction), IR (Information Retrieval), multi-document summarization, ect. In the past decade, such research focused mainly on coreference resolution in a single test. With the progress of technology, cross-document coreference resolution caused increasing concern, because it constructs many coreference chains between documents, so more information on entities can be got from the texts. Besides, some useful information obtained from cross-document coreference resolution can be feedback to coreference resolution. In this way, coreference resolution could make a breakthrough.As the study of Chinese cross-document coreference resolution is still in its infancy, this article gives a detail introduction about research on English cross-document coreference resolution. Through referencing papers on English coreference cross-document resolution, a platform of Chinese cross-document coreference resolution is designed. The research on it, actually, contains two parts: Chinese cross-document coreference resolution on personal name and on place name. We present a two-step approach to Chinese cross-document coreference resolution on personal name: Firstly, biographical information and compatibility information is extracted to form an initial set of coreference chains. Secondly, a clustering algorithm based Vector Space Model (VSM) is applied to merge chains. We also propose a method of combining the extraction of document-level information with the clustering algorithm based VSM to realize Chinese cross-document coreference resolution on place name. In addition, because of lacking of Chinese cross-document coreference resolution corpus, we collect 113 personal name documents (these documents have the same name"ZhangWei"and 30 place name documents (each document has the same place name"TongZhou") from the search engines and these documents are under pretreatment, manual proofreading and checking. We use B-CUBED algorithm to evaluate our system, on"ZhangWei"corpus, we get a F measure of 95.71%, the corresponding precision and recall are 92.41% and 99.25%, while on"TongZhou"corpus , we get a F measure of 89.30%, the corresponding precision and recall are 100% and 80.66%.Particularly, we present in this paper that different features and different combinations of features may have an effect on the platform, different calculation methods of similarity, different threshold intervals, different biographical information, compatibility information and document-level information also affect the performance of the system. At the same time, we discuss the relationship between Chinese coreference resolution and Chinese cross-document coreference resolution. By comparing the results of the experiments, we give a solution to avoid some errors in Chinese cross-document coreference resolution. Experimental results show our approach holds promise and achieves a good performance.

Keywords/Search Tags:

Coreference Resolution, Chinese Cross-Document Coreference Resolution, Biographical Information, Compatibility Information, Document-level Information, Vector Space Model, B-CUBED algorithm

PDF Full Text Request

Related items

1	Research On Cross-document Coreference Of Chinese Person Name
2	Research On Chinese Cross-Document Co-reference Resolution
3	Coreference, cross-document coreference, and information extraction methodologies
4	Coreference Resolution Model Incorporating Chinese Word Segmentation Information
5	Research On Related Technology Of End-to-end Neural Coreference Resolution
6	Research Of Key Issues In Event Coreference Resolution
7	Research On Chinese Coreference Resolution And Its Related Technologies
8	Research Of Coreference Resolution With Semantic Information
9	The Study On Cross-document Chinese Person Name Disambiguation With Coreference Resolution
10	Research Of Key Issues In Coreference Resolution Of Chinese