Font Size: a A A

On interpreting and debugging results of database queries over imprecise data

Posted on:2009-11-28Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Huang, JianshengFull Text:PDF
GTID:1448390005457737Subject:Computer Science
Abstract/Summary:
Applications ranging from grid management to sensor nets to web-based information integration and extraction can be viewed as receiving data from some number of autonomous remote data sources and then answering queries over this collected data. In such applications, it is often impossible to enforce the currency, consistency and correctness of the centralized data with respect to the original sources. For example, distributed computing environments, including workflows in computational grids, present challenges for monitoring, as the state of the system may be captured only in logs distributed throughout the system. One approach to monitoring such systems is to "sniff' these distributed logs and to store their transformed content in a DBMS. This centralizes the state and exposes it for querying; unfortunately, it also creates uncertainty with respect to the recency and consistency of the data. As another example, in information extraction, uncertainty is ubiquitous because information extraction is yet an imprecise art. We propose that instead of enforcing correctness, consistency and recency, such systems should report data quality properties along with query results and support provenance for query results, with the hope that this will allow the data to be appropriately interpreted. Toward this purpose, I present the following new concepts and techniques: (1) reporting consistency and recency of "relevant data sources" for relational queries, (2) reporting "k-relevant data sources" for relational queries, and (3) providing provenance-style explanations for non-answers to queries over extracted data.
Keywords/Search Tags:Data, Queries over, Results
Related items