Research On Wikipedia-based Chinese Cross-document Co-reference Resolution

Posted on:2015-10-18

Degree:Master

Type:Thesis

Country:China

Candidate:X M Xu

Full Text:PDF

GTID:2298330428498409

Subject:Software engineering

Abstract/Summary:

As a significant component of information extraction and information fusion, researchon Cross-Document Co-reference Resolution (CDCR) has received extensive attention.The main task of entity cross-document co-reference resolution is to solve the problem ofname variation and ambiguity. The former means that an entity has multiple names, whilethe latter means that multiple entities have the same name. The lack of large-scale Chinesecross-document co-reference corpora impedes its research. Therefore this paper conductsthe following research:1. Construct a Chinese cross-document co-reference corpus based on Wikipedia, andanalyze co-reference phenomena on the corpus, laying the foundation forlarge-scale CDCR. Statistics show that, compared with news domain, the problemof name variation is much more severe than ambiguity for Wikipedia entities.2. Implement a CDCR system using vector space model (VSM) and unsupervisedclustering. Experimental results show that the similarity score between mentionshas more effect on the system performance than that of space vectors.3. Investigate CDCR techniques on a large-scale CDCR corpus. The strategy ofâ€œdivide and conquerâ€ is adopted to partition all mentions in one particular entitytypes into different exclusive blocks with every block clustered independently,mitigating the time and space complexity brought about by CDCR on a large-scalecorpus. Experiments show that the overall CDCR time is significantly reducedwhile the performance maintains a reasonable level.

Keywords/Search Tags:

CDCR, VSM, Clustering, large-scaled corpora, Divide and Conquer

Related items

1	Divide And Conquer For SAT Solving
2	Improved KFCM Clustering Algorithms And Its Application On Divide-and-Conquer SVM
3	Research And Implementation Of Ray Tracing Software With Divide And Conquer Accelerating Algorithm
4	Divide And Conquer Strategy In Association Rule Mining
5	Evolutionary Algorithms Based On Divide-and-Conquer Strategy And Their Applications
6	Divide And Conquer Based Discriminative Feature Extraction And Its Applications
7	Divide-and-Conquer Attacks On Digital Chaotic Stream Ciphers
8	Research On The Parallel Algorithms Of0-1Knapsack Problem Based On Divide And Conquer
9	Automatic Divide-and-Conquer Based Intelligent Optimization Algorithms And Applications
10	Accelerating Image Super-Resolution Network With Pixel-Level Divide-and-Conquer