Research On Named Entity Disambiguation In Deep Web

Posted on:2011-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:R Zhang

Full Text:PDF

GTID:2248330395458024

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Due to the heterogeneity and autonomy of the Web databases, it is a challenging task to integrate the results extracted from various Web databases. The entity records in Deep Web are generally redundant or with multiple forms, which can not meet the data quality well (including data consistency, accuracy and completeness). Therefore, named entity disambiguation is an essential part of data integration during the data cleaning processing. In this thesis, we put forward a named entity disambiguation model used in Deep Web, which consists of data preprocessing module and similar entity clustering module.First, we apply XML schema to describe the global mode of entity records in data integration, which can achieve a unified format of entity records attribute information and solve the mode conflict between heterogeneous data sources; meanwhile, by defining different data types, we can obtain normalized data. Second, we propose the cluster-based named entity disambiguation algorithm, which consists of three parts:entity attribute similarity calculation, entity record similarity calculation and similar entity clustering. Take the entity records with global mode as input, we define constraint rules according to the domain knowledge, and choose the suitable similarity algorithm to calculate the entity attribute similarity; and then we calculate the entity record similarity according to the feature weights; finally, based on the entity similarity matrix, we employ the Affinity Propagation clustering algorithm to disambiguate the named entity records. The outputs are several distinct clusters, and each cluster represents a single entity of the real world.Thanks to the named entity disambiguation algorithm, we can obtain consistent and accurate entity record information, which is helpful to enhance the data quality during the multiple data sources integration in Deep Web and improve the user experience. The experiment results show our named entity disambiguation model is feasible and efficient.

Keywords/Search Tags:

Deep Web, named entity disambiguation, XML Schema, similarity, affinitypropagation clustering algorithm

PDF Full Text Request

Related items

1	Research On Named Entity Recognition And Disambiguation Based On Network Semantic Resource
2	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia
3	Research On Named Entity Recognition And Disambiguation For Short Text
4	Research And Implementation Of Online Entity Disambiguation Based On Entity Gene
5	Research On Multi-Source Named Entity Disambiguation Method For Researchers
6	Research On Cluster-based Person Name Disambiguation
7	Research And Implementation Of Named Entity Recognition Based On Ancient Literature
8	Research And Implemention Of Name Entity Disambiguation
9	Chinese Named Entity Recognition And Disambiguation Research
10	Named Entity Disambiguation Based On Chinese And English Wikipedia Knowledge Base