Font Size: a A A

Query processing in heterogeneous information systems

Posted on:1998-10-18Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Papakonstantinou, Ioannis GFull Text:PDF
GTID:2468390014476427Subject:Computer Science
Abstract/Summary:PDF Full Text Request
The thesis presents a system that provides integrated access to heterogeneous information sources that may contain unstructured or semistructured data that are not described by a regular schema (e.g., the World-Wide-Web). The sources may have different and limited query capabilities and complete knowledge of their contents and structure may not exist.; First an abstraction is proposed for the representation of semistructured sources. Then a query translation scheme is proposed for the rapid development of wrappers, i.e., agents that transform queries expressed in the common data model to queries in the native language of the underlying information source. The implementor provides a description of the (potentially limited) set of queries supported by the wrapper along with actions that do the translation.; Finally, an object-oriented logic is proposed for the declarative specification of mediators, i.e. agents that create integrated views of the data exported by the wrappers. The mediators can fuse data in an environment of semistructured sources and/or sources with changing schemas (indeed, the implementor does not need complete knowledge of the sources schemas.) The thesis presents and evaluates key query decomposition and optimization techniques that significantly reduce the cost associated with information fusion in the described environment. In addition, it presents an algorithm, run by the mediator, that given the descriptions of the (potentially limited set of) queries supported by the underlying wrappers it develops plans that retrieve the needed data using supported queries only. The descriptions may or may not be schema specific and they can describe very large or even infinite sets of "query patterns".; Most of the proposed system is implemented, as part of the TSIMMIS project at Stanford University, and integrates information from relational databases, semistructured files, and legacy systems. Part of the work has been done for the Garlic projects at IBM Almaden.
Keywords/Search Tags:Information, Data, Semistructured, Query, Sources
PDF Full Text Request
Related items