Query processing in heterogeneous information systems

Posted on:1998-10-18

Degree:Ph.D

Type:Thesis

University:Stanford University

Candidate:Papakonstantinou, Ioannis G

Full Text:PDF

GTID:2468390014476427

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

The thesis presents a system that provides integrated access to heterogeneous information sources that may contain unstructured or semistructured data that are not described by a regular schema (e.g., the World-Wide-Web). The sources may have different and limited query capabilities and complete knowledge of their contents and structure may not exist.; First an abstraction is proposed for the representation of semistructured sources. Then a query translation scheme is proposed for the rapid development of wrappers, i.e., agents that transform queries expressed in the common data model to queries in the native language of the underlying information source. The implementor provides a description of the (potentially limited) set of queries supported by the wrapper along with actions that do the translation.; Finally, an object-oriented logic is proposed for the declarative specification of mediators, i.e. agents that create integrated views of the data exported by the wrappers. The mediators can fuse data in an environment of semistructured sources and/or sources with changing schemas (indeed, the implementor does not need complete knowledge of the sources schemas.) The thesis presents and evaluates key query decomposition and optimization techniques that significantly reduce the cost associated with information fusion in the described environment. In addition, it presents an algorithm, run by the mediator, that given the descriptions of the (potentially limited set of) queries supported by the underlying wrappers it develops plans that retrieve the needed data using supported queries only. The descriptions may or may not be schema specific and they can describe very large or even infinite sets of "query patterns".; Most of the proposed system is implemented, as part of the TSIMMIS project at Stanford University, and integrates information from relational databases, semistructured files, and legacy systems. Part of the work has been done for the Garlic projects at IBM Almaden.

Keywords/Search Tags:

Information, Data, Semistructured, Query, Sources

PDF Full Text Request

Related items

1	Structured, unstructured, and semistructured search in semistructured databases
2	Implementing query processing using views in semistructured databases
3	A framework for ranking data sources and query processing sites in database middleware systems
4	Query and data mapping across heterogeneous information sources
5	Integrated Query Processing Over Autonomous Heterogeneous Data Sources
6	Research Of Semistructured Data Index Technology Based On XML
7	Research And Application Of Controllable Query For Multiple Data Sources
8	Data Management And Integration For XML-Based Semi-Structured Data
9	Querying autonomous, heterogeneous information sources
10	Integrating Deep Web data sources