Models and operators for continuous queries on data streams

Posted on:2007-11-29

Degree:Ph.D

Type:Dissertation

University:University of California, Los Angeles

Candidate:Law, Yan Nei

Full Text:PDF

GTID:1458390005984527

Subject:Computer Science

Abstract/Summary:

A new generation of data-intensive applications is emerging for managing and querying information that, rather than residing in databases, flows continuously through the network in the form of massive data streams. Hence, there is much research work on designing Data Stream Management Systems, and the approach favored by many research projects consists in extending database languages and technology for data streams. However, the new computational environment brings significant research challenges in areas such as query languages, query processing, and advanced applications.; In this dissertation, we first focus on the limitations of relational languages in expressing continuous stream queries. A main limitation follows from the fact that only nonblocking operators can be used in continuous queries, which makes relational languages incomplete on these queries. To address this problem, we investigate user-defined aggregates natively defined in SQL itself, and prove that these make SQL (i) Turing-complete on stored data, and (ii) complete on data streams. Furthermore, we illustrate the effectiveness of the proposed extensions on complex applications involving time-series queries, and mining queries.; For advanced applications, we focus on data-stream mining algorithms, which must now be redesigned to make lighter demands on resources and display greater adaptability than those on stored data. In this dissertation, we introduce ANNCAD, which uses multi-resolution data representation to classify new test points using the nearest-neighbors principle. The incremental property and very fast update speed make ANNCAD very suitable for mining data streams. Our experiments show that ANNCAD is adaptive and works well in many applications, including image recognition and censor surveying.; We then study the problem of stream query processing. We propose a load-shedding technique for multi-join, called MSketch, which makes decisions based on the productivity of tuples, rather than only the content of the joined pair stream. A thorough study shows that MSketch outperforms other existing algorithms. Finally, we propose general techniques for optimizing the accuracy of window aggregates, statistical aggregates and mining queries in the presence of sampling. Our method incorporates prior knowledge into an error model that is used to reduce the uncertainty introduced by sampling. We also extend the method to adjust to concept shifts.

Keywords/Search Tags:

Data, Queries, Applications, Continuous

Related items

1	Optimizing multiple continuous queries
2	Research And Implementation Of Continuous Geo-textual Skyline Queries
3	Continuous queries over data streams
4	Processing continuous queries over streaming data with limited system resources
5	Data Management Of Indoor Moving Objects
6	Research On Spatial Data Search Method Based On Top-k
7	Continuous Distance Queries And Updating Over Moving Objects
8	Approximate Evaluation of Queries for Spatial, Uncertain and Probabilistic Databases
9	Top-k Complex Event Query Technology For Data Streams
10	Research On Indexing Technology For Continuous Uncertain XML