Font Size: a A A

Research On Uncertain Time Series Similarity Matching

Posted on:2013-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZuoFull Text:PDF
GTID:2218330371455858Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A time series is a sequence records according with the chronological order. Similarity matching is one of the underlying operations for time series clustering, outlier detection and pattern discovery tasks.Currently, study of time series similarity matching mainly focuses on deterministic data. With the development of the Internet of things and privacy protection technology, uncertain time series will be in large numbers and time series similarity matching technology is facing new challenges.In the case of uncertain time series, the distance between the two sequences is uncertain, so the way of similarity matching on deterministic time series cannot use directly.In order to solve the problem of uncertain time series similarity matching, we have established a data model to describe the uncertain time-series. Under this model, the data point at each time slot was built up by the set of one sample observations. Each sampling point has the same probability of occurrence, that is uniformly distributed and different time points of the time series is relatively independent. In this model, the true distance between two uncertain time series are consisting of a large number of possible distance (with a certain probability value). Therefore, on the basis of the model proposed by this paper, two algorithms have be proposed for uncertain time series similarity matching:a-PRQ (mean method) and k-PRQ (cluster method).(1) a-PRQAccording to the query sequence and time series data stored in database are whether deterministic, The uncertain timing sequence similarity query is divided into three different types; Then, for each type, by the means method (averaging method) extracted from the sequence of uncertainty out of a deterministic sequence to represent the original sequence take the deterministic time series similarity matching the query. (2)k-PRQThis algorithm is mainly through a two-step pruning to reduce the computational complexity:1) Through the cluster to reduce the sample size (sample size) to calculate the distance to each cluster after clustering as a unit, thereby greatly reducing the computational complexity.2) Pre-calculated a given thresholdε, from the number of upper and lower bounds, we can get the distance to the probability of the upper and lower bounds through probability of the upper and lower bounds, it filters out unnecessary calculations and thus reduce the computational complexity.The experiments show that the two uncertain time series similarity matching algorithm has better performance and accuracy.
Keywords/Search Tags:uncertain time series, uncertain data model, similarity matching, probabilistic range query, time series distance
PDF Full Text Request
Related items