Font Size: a A A

Schema Mapping-Based Model For Integration Of Heterogeneous Data Sources

Posted on:2011-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ZhaoFull Text:PDF
GTID:2178360305451600Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of various enterprise systems and enormous improvement of heterogeneous frameworks, how to integrate theses heterogeneous applications becomes the hottest issue in the current database domain. Data integration should be done before the implementation of application integration. Then the ontology-based data integration stands out from a large quantity of integration methods and turns into an important research point in the sphere of data integration. In the ordinary course of events, because of the heterogeneity and diversity of data sources, there are variety of conflicts in the process of data integration, such as naming conflict, unit conflict and order conflict. Conflicts should be found first, and would be solved manually or automatically according to some policies or rules.In order to solve these problems, with the help of semantics of ontology, this paper proposes a schema mapping-based model for data integration. This model is constructed from mapping rules, and can automatically discover and solve some types of conflicts, so we named it RCM, short for mapping rule-based conflict-solved model for data integration.This paper first constructs RCM, which contains local concept set, global concept set, mapping rule set, conflict set and constraint set. The last three is the core of the model and described in the form of mapping document. Then an extended algorithm is proposed to discover and solve conflicts. Last it illustrates how to implement the RCM framework.In aspect of data source description, each local data source is described by the separated ontology to show its semantics. In order to compare the differences of ontologies, a common vacabulary is made, which contains almost all terms in this domain. It is built as the ontology of the global source and covers the local concept set and global concept set of RCM. Then mapping documents are formed with OWL and used to describe the mapping relations between global and local sources, which cover mapping set, conflict set and constraint set of RCM. This paper researches traditional conflict-solved algorithms and common query-rewriting algorithms, analyses their shortcomings and proposes a new algorithm for discovering and solving conflicts. It scans the mapping documents and could automatically discover conflicts and find data sources and concepts which cause conflicts, and then modifies the information of conflicts in mapping documents. In the process of rewriting queries, the modified documents would be used to solve the data level and semantic level conflicts and gurantee that the final result is right, such as the unit conflict, representation conflict and naming conflict.At last, the framework exented from RCM is introduced, which consists of user interface, query processing, mapping document processing and result extraction. It merges the advantages of GLAV, ontology and so on and gives an available implemention of each component.
Keywords/Search Tags:data integration, conflicts solution, ontology, schema mapping
PDF Full Text Request
Related items