Font Size: a A A

Adaptive query processing in data stream management systems

Posted on:2006-01-18Degree:Ph.DType:Thesis
University:Stanford UniversityCandidate:Babu, ShivnathFull Text:PDF
GTID:2458390008970305Subject:Computer Science
Abstract/Summary:
Many modern applications need to process data streams that consist of data elements generated in a continuous unbounded fashion. Examples include network monitoring, financial monitoring over stock tickers, sensor processing for environmental monitoring or inventory tracking, telecommunications fraud detection, and others. These applications have spurred interest in a new class of systems, called Data Stream Management Systems (DSMSs), that enable applications to pose long-running continuous queries over data streams.; A fundamental challenge faced by DSMSs is that stream conditions (e.g., data distribution, arrival rate) and system conditions (e.g., query load, memory availability) may vary significantly over the lifetime of a continuous query. When stream or system conditions change, a query execution strategy that was efficient before the change may become very inefficient. Consequently, it is important for a DSMS to support adaptive query processing: The DSMS must be prepared to change the execution plan for a continuous query while the query is running, based on how stream and system conditions change. Without adaptivity, plan performance may drop drastically over time.; This thesis presents a generic framework, called StreaMon, for adaptive query processing in a DSMS. StreaMon has three core components: (i) An Executor, which runs the current plan for each query, (ii) a Profiler, which collects and maintains statistics about current stream and system conditions, and (iii) a Re-optimizer, which ensures that the current plans are the most efficient for current conditions. We instantiate the generic StreaMon framework for three distinct combinations of continuous query type and adaptivity need: (1) Adaptive processing of commutative filters over a stream to maximize throughput at all points in time. (2) Adaptive placement of subresult caches in pipelined plans for windowed stream joins to maximize throughput at all points in time. (3) Detecting relaxed constraints automatically in input streams and exploiting these constraints to reduce memory requirements in plans for windowed stream joins. For each problem, we provide the definition and motivating examples, develop and analyze adaptive algorithms, and present implementation techniques and experimental results from the STREAM general-purpose DSMS prototype developed at Stanford.
Keywords/Search Tags:STREAM, Adaptive, Data, DSMS, System, Continuous
Related items