Font Size: a A A

Querying autonomous, heterogeneous information sources

Posted on:2001-12-14Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Vassalos, Vasilios AntoniouFull Text:PDF
GTID:2468390014458192Subject:Computer Science
Abstract/Summary:
A wide variety of information sources are available both in internal networks of organizations and on the Web. These sources are autonomous, have different and limited query capabilities, and usually contain heterogeneous data that only have partial, flexible, or implicit structure, i.e. , that are semistructured (e.g., XML, bibliographic, or genomic data). Enabling users to query the integrated information contained in these sources is a crucial requirement for increasing the usefulness of the Web as an information resource and for enabling electronic commerce.; An effective system for on-demand integration of such sources needs to perform efficiently two main tasks in response to a user query: First, devise a query plan that locates and retrieves the relevant pieces of information from the sources, by submitting to the sources localized queries that respect the sources' query capabilities. Second, combine the pieces of information to produce a unified answer. This thesis develops powerful query processing techniques and architectures for information integration and studies some of the tradeoffs between the generality of the query language and the efficiency of query processing in an information integration system.; The thesis adopts a powerful framework for the construction of an on-demand integration system, proposed by the TSIMMIS project at Stanford University. In this framework, the core of the integration system is a query processor called a mediator that implements the integrated query processing algorithms. The details of an integration scenario, including the virtual integrated views that describe the way source information is combined, and the contents and query capabilities of the sources, are specified declaratively, in a high-level specification language.; The thesis studies logical languages for the specification of integrated views and the description of query capabilities. Both relational and semistructured languages, with and without recursion, are studied from the point of view of expressive power and efficiency. The thesis presents sound and complete algorithms that solve the key problem of generating query plans that respect query capabilities described in these powerful languages (the capability-based rewriting problem). In particular, the first algorithm solving this problem for a semistructured language is presented.
Keywords/Search Tags:Information, Query, Sources
Related items