Font Size: a A A

Lav In Data Integration System Query Processing

Posted on:2006-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:T B ChenFull Text:PDF
GTID:1118360155960656Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A distributed data integration system connects data sources scattered across different sites, andprovides automatically accessing to them. Also, it integrates the underlying data sources into areconciled and unified global view.Such a system may benefit many applications, including commerical information integrationsystems on the Web, governmental information integration systems for the publics, andinformation share and cooperation systems of enterprises.Data warehouse, peer-to-peer structure and mediator structure are all possible implementation fordata integration. Our work is based on a mediator structure system, concentrates on several keyissues for the query processing. As the background knowledge for relevant discussion, the paperbegins with an introduction on the architecture of the system, concerning its global schema andlocal schemas. The rest contents form the major parts of the paper:1. Query rewritting based on data souces: our system is construct in LAV style, which describes data sources as views over global schema, so query rewritting technique can be used for query processing here. Query rewritting comes from the requirement of answering queries using materialized views. It has two independent sources of complexity, one is finding mappings between views and the current query, the other is costructing rewrittings from mappings, both are NP completed. Previous works simply enumerated and checked each kind of possibilities, this caused a lot useless combining and checking work. We provide adaptation to reduce these kinds of redundancy. For mappings between current query and each view, our method not only reduce redundant mappings, but also make use of Bachman graph, when certain condition statisfied, it can help to find a unique order of computing these mappings. For forming query rewrittings from these mappings, our method only generates those combinations covering all subgoals of current query. The algorithm is further adapted in this paper for data sources with query capability limitations.2. Query optimization: the goals of query optimization in distributed data integration system are different from those in traditional database system. We study the optimization of join operation at first, for ordering the execution of several join operations, when reducing data flow is the aim, we prove that only linear join tree need to be considered. While for improving the speed of query response, both linear and bushy join trees should be included in the execution space for query plan searching. The performance of each data source and the network are changing all the time, static query plans may not satisfiy this dynamic environment. It is incrediable, but such an obvious problem is ignored in almost all prevous relevant researches. We propose a query processing algorithm combined with optimization strategies, which can adjust the query process according the current state of the network environment. The other major part of query optimization is for selection operation, we invent a method to distribute selection condition of current query among different data sources, which can make use of local capability of each data source as much as possible.3. Constructing datalog program for query answering: to get as much data satisfing current query as possible, we have to process all the query rewrittings, this cause a lot repeat accessing to same data sources. A better method is to construct a datalog program for the query processing, another advantage of this method is the program can make use of what we called query information path among different data sources. The major contributions of our algorithm is on...
Keywords/Search Tags:Autonomous data sources, query rewriting, binding pattern, query plan, data source capability, data integration, datalog program, linear join tree, bushy join tree, executioin space.
PDF Full Text Request
Related items