Font Size: a A A

Research On Named Entity Disambiguation In Deep Web

Posted on:2011-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2248330395458024Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the heterogeneity and autonomy of the Web databases, it is a challenging task to integrate the results extracted from various Web databases. The entity records in Deep Web are generally redundant or with multiple forms, which can not meet the data quality well (including data consistency, accuracy and completeness). Therefore, named entity disambiguation is an essential part of data integration during the data cleaning processing. In this thesis, we put forward a named entity disambiguation model used in Deep Web, which consists of data preprocessing module and similar entity clustering module.First, we apply XML schema to describe the global mode of entity records in data integration, which can achieve a unified format of entity records attribute information and solve the mode conflict between heterogeneous data sources; meanwhile, by defining different data types, we can obtain normalized data. Second, we propose the cluster-based named entity disambiguation algorithm, which consists of three parts:entity attribute similarity calculation, entity record similarity calculation and similar entity clustering. Take the entity records with global mode as input, we define constraint rules according to the domain knowledge, and choose the suitable similarity algorithm to calculate the entity attribute similarity; and then we calculate the entity record similarity according to the feature weights; finally, based on the entity similarity matrix, we employ the Affinity Propagation clustering algorithm to disambiguate the named entity records. The outputs are several distinct clusters, and each cluster represents a single entity of the real world.Thanks to the named entity disambiguation algorithm, we can obtain consistent and accurate entity record information, which is helpful to enhance the data quality during the multiple data sources integration in Deep Web and improve the user experience. The experiment results show our named entity disambiguation model is feasible and efficient.
Keywords/Search Tags:Deep Web, named entity disambiguation, XML Schema, similarity, affinitypropagation clustering algorithm
PDF Full Text Request
Related items