Hashing Based Research On Approximate Query Of Time Series

Posted on:2016-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:G P Liu

Full Text:PDF

GTID:2308330476952168

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years, with the rapid development of Internet, thousands of data are producing. In some areas, such as data mining, machine learning, information retrieval, it is particularly important of mining useful information from those data. One of those data, which is called time series, is an extremely important and time dependent data. Time series exist widely in various fields including finance, geography, medicine, meteorology, and so on.At present, time series data mining mainly focus on similarity search, clustering/classification analysis, visualizing time series, segmentation, prediction and anomaly detection, etc.Time series similarity is one of the fundamental problems in time series mining. As a result,searching the time series similarity as the research target is very interesting. Due to the numerical and continuous nature, time series is always considered as a whole instead of individual numerical field. Therefore, the similarity search on time series is approximate, which does not similar to the certain query in traditional database.Due to the high-dimensionality of time series, it is difficult to deal with it directly. Hashing is a very common compression mapping technology, which can transform an input of any length to a fixed length output by hashing algorithm. The output’s space is typically much smaller than the input’s one, so we can use the hashing to process the time series.The main contributions are following:First of all, using the LSH(Locality Sensitive Hashing) technology, we propose an algorithm called LSHSM to process the subsequence matching problem of time series. LSH can hash the objects which are close to each other to the same bucket with high probability. Therefore, we can filter out a lot of dissimilar objects without unnecessary comparisons, which will improve the speed of retrieval. Different to FRM or DualMatch, this paper does not require the time series to do DFT or DWT feature transformation. We directly regard the time series as a high-dimensional data points, and use the nature of LSH to handle time sequences. When doing the experiment we use three different datasets to show that the effectiveness of the algorithm.Then, by introducing the concept of the associative deletion, we deal with two-attribute time series. If somebody wants to delete one attribute according to another one, he needs to store the related information of these two attribute to maintain data consistency. Bloom filter is a very powerful tool to represent the data sketch. We use them to represent the sketches of two attribute of time series, and then realize the deletion of the expired data. We solve the problem of inconsistencycaused by deleting data which has multiple attributes, and with low space overhead. Experiments show that within the controllable range of error rate, the deletion of time series has high accuracy.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Similarity Image Retrieval Based On Locality Sensitive Hashing And Structured P2P Network
2	Research On Hashing Methods Towards Large-scale Similarity Computation And Search
3	Bloom Filter Algorithm For Low-overhead Approximate Query For Missing Data
4	Dynamic Similarity Search Over Encrypted Data
5	Research On Locality Sensitive Hashing-Based Similarity Search
6	Locality Sensitive Hashing Algorithm Based On P-stable Distribution And Space Sphere Lattice
7	Working Towards Performance Analysis Of Locality Sensitive Hashing
8	Approximate Similarity Search In High-Dimensional Euclidean Space
9	Secure Similarity Search Over Encrypted High-Dimension Data
10	Research Of Approximate K-Nearest Neighbors Search Algorithm Based On Locality Sensitive Hashing