Font Size: a A A

Research And Application Of Heterogeneous Data Integration Based On Ontology In Data Warehouses

Posted on:2011-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:J M CaiFull Text:PDF
GTID:2248330395457859Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The traditional ETL scheme in data warehouse can aggregate data only on the level of syntax and structure now, and it can not solve the problem of data sharing and reusing and aggregating in semantic level. The ETL process is complex and has no intelligence. Due to the semantic heterogeneity, it makes the poor integration on distributed data. Moreover, the ETL procedure will make that the DW developers cast doubt on the ETL procedure, programs, and the quality of the data flow. And what’s more, the traditional ETL procedure is designed only by the understand of the designers and there is no unified standard, therefore it is hard to compatible with other data sources and has poor extensibility and reusability, so it will lead to new "data island".Ontology has accurate description for concept of semantic and strong ability for ontology reasoning, so it can avoid the shortages of traditional ETL by using Ontology-Based ETL. In this paper, we address the issue of using ontology to facilitate the process of data integration in data warehouse. In particular, we present an ontology-based approach that facilitates the construction of ETL workflow, and solve the problem of semantic heterogeneity in data warehouse.In this paper we present an ontology-based data integration framework, and the main contents are as follows. Firstly, we research the construction of the application vocabulary and design an algorithm for the annotation of the data sources and data warehouse. And secondly, we introduce a method for the generation of application ontology based on common vocabulary. Thirdly, Ontology mapping and reasoning have been studied, and we provide a method to automatically derive the ETL process. At last, we show the effectiveness and efficiency of our framework in the "Digital Ocean".The result showed that users can play an active role in designing the system by using the field terms. And further more, it solves the semantic heterogeneous problem by annotating the data stores. Through the ontology reasoning, we can get the connotative semantic relations and verify the ontology structure and check the ontology consistency. Ontology-based ETL procedure solves the shortages of the traditional ETL, and moreover it provides the communication between designers and users and advances efficiency of systems analysis and modeling.
Keywords/Search Tags:data warehouse, ontology, ETL, semantic integration, mapping and reasoning
PDF Full Text Request
Related items