Font Size: a A A

Semantic-based Query In Heterogeneous Information Integratiuon Environment

Posted on:2007-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:X G WangFull Text:PDF
GTID:1118360242461943Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer techniques and the wide application of database systems, an emerging demand for interoperating and programming across heterogeneous data sources has occurred. In the 90s last century structured information integration architectures were offered by some scholars, e.g. multi-database system. However the amount of information available on-line is proliferating at a tremendous rate during the last decade. On the other hand the application demands vary frequently. Structured information integration architecture shows some localization in the face of expanding information and frequently varying application demand. Semantic of information isn't processed by machine in the architecture; the integration process is complicated and manual. It can't meet new generation application.In the integration process of tradition integration method, developer must manually build mapping between some schemas from local information source space to integration information space. Shared semantic of interoperation isn't formalized yet. Some semantic is still denoted in code logic. In the future the integrated system maybe becomes a member system of a larger integration system. Semantic of this code logic will still become obstacle of integration. Focusing on the problems we design a heterogeneous information integration architecture (HISIM) based on semantic of ontology mapping. Shared semantic of interoperation is formalized in HISIM. Semantic being processed by machine is realized; on the other hand facing the dynamic application requirement HISIM automatically construct shared semantic of interoperation, the process of integration is automatized. The new integration method meets the need of application more.As ontology is an effective method of semantic modeling, people pay more and more attention to it in intelligent information integration domain. The absent semantic in interoperation and the semantic in application code of information source need to be formalized and declared. A formal definition of ontology and ontology algebra are offered to solve semantic conflict between information sources. These formalizations improve correctness, consistency and validity of integration. Considering XML has actually become standard of data exchange on the Internet, we give an ontology denotation and conception search method based on XML.After giving ontology taxonomy and definition of domain ontology (DO), user ontology (UO), local ontology (LO) etc., we introduce how to build mapping rules between DO and LO. These rules are denoted in some manner so as that can be processed by machine. This method improves intelligence and automatization of integration process. On the basis above we give a semantic associated model of domain ontology, which simulates the physiological feature of human being's memory and association. According to semantic association in length and breadth we realize semantic knowledge unit query and provide the query algorithm. Using the algorithm we can find association ontology (AO, which is semantic bridge between some LOs) from DO. Then we provide an algorithm that merges AO, GO and LO to UO, which is necessary for interoperation.Query processing is the process of scheduling the query execution plan and combining the intermediate results according to the query processing operations. These operations are composed of the inter-site operations related to the integration queries. By analyzing semantic of ontology mapping we can obtain the transformation rules and query processing operations, and we provide a join tree structure to denote the query processing operations in integration system. And the method for formalizing the join tree into join normal tree (JNT) is also presented. The concept of query join graph is introduced to the query processing scheduling, and then the JNT can be equivalently transferred to query join graph for the scheduling of query processing. Thus, a multi-level parallel scheduling algorithm for query processing based on query join graph is presented to improve the performance of concurrence of the query execution.Analyzing the cost parameters of query processing, we present the methods to estimate the costs of local data sources and inter-site communications. Generally Cartesian Product, which can generate many more invalid tuple, has the most costs in query processing. Query processing should avoid it. Based on the query join graph composed of inter-site joins and outer joins we offer a static optimization (LOS) algorithm based on a linear-order, and a statistical-reasoning-based dynamic optimization (SRD) method. We analyses performance through simulations and experimental. Results show that two optimization strategies are effective.
Keywords/Search Tags:Heterogeneous information integration, Semantic model, Ontology Query processing scheduling, Query optimization
PDF Full Text Request
Related items