Font Size: a A A

Interactive query processing

Posted on:2002-10-09Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Raman, VijayshankarFull Text:PDF
GTID:2468390011497032Subject:Computer Science
Abstract/Summary:
Information extraction is increasingly a long and frustrating iterative process, because of large data sizes, increasing data distribution, and the hard-to-automate nature of many processing tasks. This thesis investigates the alleviation of this problem through interactive query processing. Interactivity involves giving users continual feedback during query execution in the form of partial query results, and allowing users to dynamically control the query execution according to their interests in these partial results.; In this thesis, we develop modifications to the standard query processor architecture that enable such user-system interaction. We start by developing a pipelining reorder operator that can be inserted into a standard query plan to make it dynamically tunable during query execution. This operator uses the throughput differences between adjacent query operators to reorder tuples within the query dataflow and prioritize the processing of tuples of interest to the user. We then study the application issues involved in interactive processing, by developing an interactive data cleaning and transformation tool. This tool allows users to explore large datasets on a spreadsheet-like interface, and graphically specify transforms to clean errors in the data format. All operations are performed with instantaneous response times, by focusing work on data that is visible to the user. We then investigate the generation of partial result records, that may not contain all output columns, as a way to improve the system interactivity during query execution. Our focus is on generating these partial results in a fashion that is responsive to both the user's interests in the results and the properties of the data sources involved in the query. A significant hurdle to such partial result generation is the traditional query execution dataflow of optimizer-selected query plans. We develop a more dynamic dataflow scheme that continually adapts two orderings within the query dataflow: the order in which intermediate tuples are routed, and the order in which these tuples flow through query operators. We then refine the granularity of query operators in this architecture, routing tuples not through logical query operators like join operators, but instead through physical operators like query data-structures. This scheme allows the query processor to adapt query execution at a fine granularity, and respond more effectively to changing user interests and data source properties.
Keywords/Search Tags:Query, Data, Interactive, Processing
Related items