Font Size: a A A

Research On Chinese Cross-Document Co-reference Resolution

Posted on:2014-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:D F HuangFull Text:PDF
GTID:2298330431973711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cross document coreference means that multiple names in multiple documents refer to the same entity with its main problems including name variation and name ambiguity. Therefore, the two tasks of Cross document coreference resolution (CDCR) are name variation aggregation and name disambiguation. This paper first introduces how to construct our own CDCR corpus, and then investigates various techniques and methods for CDCR via unsupervised clustering algorithms. The main content is as follows:(1) describes the basic concepts of CDCR and to determine its specific tasks, and introduces several clustering algorithms and performance evaluation methods currently used.(2) based on the ACE2005Chinese corpus, builds a Chinese cross document resolution corpus, and analyzes the phenomena of name variation and name ambiguity, laying the foundation for the experiments in the next section.(3) conducts research on cross document coreference resolution via the vector space model and unsupervised clustering algorithms based on the previous corpus, and implements parameter optimization and performance analysis for various entity types. In addition, according to the morphological characteristics of entity name, we combine the entity mention features and space vector features to carry on unsupervised clustering-based CDCR.The experimental results show that, compared to the vector space features, entity mention features are more conducive to cross document coreference resolution, and for a variety of entity types (such as PER, ORG and GPE), we have achieved good performance as high as about90%.
Keywords/Search Tags:cross document coreference resolution, information extraction, SVMclassifiers, clustering algorithms
PDF Full Text Request
Related items