Font Size: a A A

Hashing Based Research On Approximate Query Of Time Series

Posted on:2016-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:G P LiuFull Text:PDF
GTID:2308330476952168Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet, thousands of data are producing. In some areas, such as data mining, machine learning, information retrieval, it is particularly important of mining useful information from those data. One of those data, which is called time series, is an extremely important and time dependent data. Time series exist widely in various fields including finance, geography, medicine, meteorology, and so on.At present, time series data mining mainly focus on similarity search, clustering/classification analysis, visualizing time series, segmentation, prediction and anomaly detection, etc.Time series similarity is one of the fundamental problems in time series mining. As a result,searching the time series similarity as the research target is very interesting. Due to the numerical and continuous nature, time series is always considered as a whole instead of individual numerical field. Therefore, the similarity search on time series is approximate, which does not similar to the certain query in traditional database.Due to the high-dimensionality of time series, it is difficult to deal with it directly. Hashing is a very common compression mapping technology, which can transform an input of any length to a fixed length output by hashing algorithm. The output’s space is typically much smaller than the input’s one, so we can use the hashing to process the time series.The main contributions are following:First of all, using the LSH(Locality Sensitive Hashing) technology, we propose an algorithm called LSHSM to process the subsequence matching problem of time series. LSH can hash the objects which are close to each other to the same bucket with high probability. Therefore, we can filter out a lot of dissimilar objects without unnecessary comparisons, which will improve the speed of retrieval. Different to FRM or DualMatch, this paper does not require the time series to do DFT or DWT feature transformation. We directly regard the time series as a high-dimensional data points, and use the nature of LSH to handle time sequences. When doing the experiment we use three different datasets to show that the effectiveness of the algorithm.Then, by introducing the concept of the associative deletion, we deal with two-attribute time series. If somebody wants to delete one attribute according to another one, he needs to store the related information of these two attribute to maintain data consistency. Bloom filter is a very powerful tool to represent the data sketch. We use them to represent the sketches of two attribute of time series, and then realize the deletion of the expired data. We solve the problem of inconsistencycaused by deleting data which has multiple attributes, and with low space overhead. Experiments show that within the controllable range of error rate, the deletion of time series has high accuracy.
Keywords/Search Tags:time series, similarity search, locality sensitive hashing, associative deletion, bloom filter
PDF Full Text Request
Related items