Font Size: a A A

Similarity Query And Optimization Over Data Streams

Posted on:2010-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhengFull Text:PDF
GTID:2178360275491459Subject:Software and theory
Abstract/Summary:PDF Full Text Request
In recent years,data streams attract numerous research interests due to its importance in real applications such as sensor network,financial marketing,mul timedia,etc.Data streams are characterized by some key features,like fast speed, huge volume and non-retrievable,which makes it difficult to handle by traditional DBMS.A lot of effort have been made on developing novel algorithms over data streams,most of which focused on designing efficient one-scan algorithm and ef fective sketch to summarize great amount of data within limited memory space. Some classic problems such as mining frequent items,mining association rules, top-K queries,are extended to data stream environments and addressed elegantly. In this paper,we target another important query - similarity query,which plays essential role in financial data analysis,network monitoring,outlier detection and so on.This problem is very challenging because the high dimensionality of data streams makes the computation of similarity extremely complex.This paper first thoroughly investigate several popular similarity functions,and then introduce the k-DTW distance as similarity measurement,which can reduce the complexity while keep the accuracy at the same time.In order to catch up the high speed of data streams,we also extend PAA technique to do dimensionality reduction and prove the lower bound lemma after transformation.Based upon above techniques, we develop two efficient algorithms,which can prune the non-qualifying data as early as possible and support similarity query over multiple data streams.
Keywords/Search Tags:data streams, similarity query, DTW distance, dimensionality reduction
PDF Full Text Request
Related items