Similarity Query And Optimization Over Data Streams

Posted on:2010-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:K Zheng

Full Text:PDF

GTID:2178360275491459

Subject:Software and theory

Abstract/Summary:

In recent years,data streams attract numerous research interests due to its importance in real applications such as sensor network,financial marketing,mul timedia,etc.Data streams are characterized by some key features,like fast speed, huge volume and non-retrievable,which makes it difficult to handle by traditional DBMS.A lot of effort have been made on developing novel algorithms over data streams,most of which focused on designing efficient one-scan algorithm and ef fective sketch to summarize great amount of data within limited memory space. Some classic problems such as mining frequent items,mining association rules, top-K queries,are extended to data stream environments and addressed elegantly. In this paper,we target another important query - similarity query,which plays essential role in financial data analysis,network monitoring,outlier detection and so on.This problem is very challenging because the high dimensionality of data streams makes the computation of similarity extremely complex.This paper first thoroughly investigate several popular similarity functions,and then introduce the k-DTW distance as similarity measurement,which can reduce the complexity while keep the accuracy at the same time.In order to catch up the high speed of data streams,we also extend PAA technique to do dimensionality reduction and prove the lower bound lemma after transformation.Based upon above techniques, we develop two efficient algorithms,which can prune the non-qualifying data as early as possible and support similarity query over multiple data streams.

Keywords/Search Tags:

data streams, similarity query, DTW distance, dimensionality reduction

Related items

1	Dimensionality Reduction Technique For Visualization In Wasserstein Space
2	LLE Dimensionality Reduction And Its Application In The Infrared And Low-light Image Recognition
3	Research On Similarity Query Over Sequence Data
4	Multi-label Learning Based On Dimensionality Reduction
5	Application Of T-SNE Algorithm In Dimensionality Reduction Of High Dimensional Data
6	Research On Dimensionality Reduction And Prediction Methods In Time Series Data Ming
7	Reduction Algorithm For Skyline Query Results Based On Dimension Preferences
8	Research On Dimensionality Reduction And Quantification Methods Of Approximate Nearest Neighbor Query For Streaming Data
9	Research On Unsupervised 2D Dimensionality Reduction Algorithms With Adjacency Graph Learning
10	The Application Of Manifold Learning In Data Dimensionality Reduction