Font Size: a A A

Efficient query processing for data integration

Posted on:2003-04-22Degree:Ph.DType:Thesis
University:University of WashingtonCandidate:Ives, Zachary GeorgeFull Text:PDF
GTID:2468390011980605Subject:Computer Science
Abstract/Summary:
A major problem today is that important data is scattered throughout dozens of separately evolved data sources, in a form that makes the “big picture” difficult to obtain. Data integration presents a unified virtual view of all data within a domain, allowing the user to pose queries across the complete integrated schema.; This dissertation addresses the performance needs of real-world business and scientific applications. Standard database techniques for answering queries are inappropriate for data integration, where data sources are autonomous, they generally lack mechanisms for sharing of statistical information about their content, and the environment is shared with other users and subject to unpredictable change. My thesis proposes the use of pipelined and adaptive techniques for processing data integration queries, and I present a unified architecture for adaptive query processing, including novel algorithms and an experimental evaluation. An operator called x-scan extracts the relevant content from an XML source as streams across the network, which enables more work to be done in parallel. Next, the query is answered using algorithms (such as an extended version of the pipelined hash join) whose work is adaptively scheduled, varying to accommodate the relative data arrival rates of the sources. Finally, the system can adapt the ordering of the various operations (the query plan), either at points where the data is being saved to disk or in mid-execution, using a novel technique called convergent query processing. I show that these techniques provide significant benefits in processing data integration queries.
Keywords/Search Tags:Data, Query processing, Queries
Related items