Combining schema and instance information for integrating heterogeneous databases: An analytical approach and empirical evaluation

Posted on:2003-02-13

Degree:Ph.D

Type:Dissertation

University:The University of Arizona

Candidate:Zhao, Huimin

Full Text:PDF

GTID:1468390011483632

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

Critical to semantic integration of heterogeneous data sources, determining the semantic correspondences among the data sources is a very complex and resource-consuming task and demands automated support. In this dissertation, we propose a comprehensive approach to detecting both schema-level and instance-level semantic correspondences from heterogeneous data sources. Semantic correspondences on the two levels are identified alternately and incrementally in an iterative procedure. Statistical cluster analysis methods and the Self-Organizing Map (SOM) neural network method are used first to identify similar schema elements (i.e., relations and attributes). Based on the identified schema-level correspondences, classification techniques drawn from statistical pattern recognition, machine learning, and artificial neural networks are then used to identify matching tuples. Multiple classifiers are combined in various ways, such as bagging, boosting, concatenating, and stacking, to improve classification accuracy. Statistical analysis techniques, such as correlation and regression, are then applied to a preliminary integrated data set to evaluate the relationships among schema elements more accurately. Improved schema-level correspondences are fed back into the identification of instance-level correspondences, resulting in a loop in the overall procedure. Empirical evaluation using real-world and simulated data that has been performed is described to demonstrate the utility of the proposed multi-level, multi-technique approach to detecting semantic correspondences from heterogeneous data sources.

Keywords/Search Tags:

Heterogeneous data, Semantic correspondences, Approach, Schema

PDF Full Text Request

Related items

1	A semantic analysis of XML schema matching for B2B systems integration
2	Ontology-based Semantic XML Description And Application Of Heterogeneous Data In Enterprise
3	Research On Heterogeneous DNA Data Based On XML
4	A Semantic-Based Approach To Translating Relational Data To XML Data
5	Research And Application Of Data Exchange Based On XML
6	Schema Free Querying of Semantic Data
7	Dynamic schema evolution in a heterogeneous database environment: A graph theoretic approach
8	Ontology mapping neural network: An approach to learning and inferring correspondences among ontologies
9	Research On Schema Matching Technology Supporting Massive Heterogeneous Data Integration
10	Research And Application For The Semantic Matching Of XML Tags