A new hierarchical clustering model for speeding up the reconciliation of XML-based, semistructured data in mediation systems

Posted on:2002-02-27

Degree:Ph.D

Type:Dissertation

University:University of Florida

Candidate:Pluempitiwiriyawej, Charnyote

Full Text:PDF

GTID:1468390011992939

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

This dissertation describes the underlying research, design and implementation for a Data Merge Engine (DME). Specifically, we have developed a hierarchical clustering model as a new solution to speed up the merging of similar and overlapping data items from multiple information sources. We use a tree-based heuristic algorithm for clustering data in a multi-dimensional metric space. Equivalence of data objects within the individual clusters is determined using a number of distance functions that calculate the semantic distances among the objects based on their attribute values. Because of the diversity of numbers of data items to be compared, we have developed a set of heuristics to appropriately reconcile data items. The experimental results show that our approach is more efficient and provides more accurate results when compared with other existing approaches.; Given the immense popularity of the World Wide Web (Web), we focus mainly on reconciling semistructured data. Specifically, we use the Extensible Markup Language (XML) as our internal data model for representing heterogeneous data. As part of our research, we have developed a comprehensive classification for schematic and semantic conflicts that can occur when merging data from related XML-based information sources.; The research proposed here is conducted within the context of the Integration Wizard (IWIZ) system, which allows users to access and retrieve information from multiple sources through a consistent, integrated view. To improve query response time, IWIZ uses a combined mediation/data warehousing approach to information integration.

Keywords/Search Tags:

Data, Clustering, Model, Information

PDF Full Text Request

Related items

1	The Research Of Application And Optimization Of Gaussian Mixture Model In Data Clustering
2	Research On Clustering Algorithms For Incomplete Data
3	Research On Internet Information Collection And Processing Technology
4	Ensemble Clustering For Mixed Data Via Combining Content And Structure Information
5	A Clustering Rule Based Approach for Classification Problems
6	Research On Clustering Algorithms For Large-scale Complex Data
7	Research And Application Of Web Text Information Clustering Algorithm
8	Researchs On Mixed Data Clustering Methods Based On Density Peaks And Dimensional Probability Model
9	Research And Application Of New Methods In Symbolic Clustering
10	Gis-based Spatial Clustering Algorithm, The Research And Application