Font Size: a A A

Study On Query Optimization Methods Of Segmented Time Series

Posted on:2011-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:T JiangFull Text:PDF
GTID:1118360305992006Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Time series is a sequence data formed by sampling at discrete time and has a large difference from the other data. It exists in many domains, for example, biomedicine, finance, meteorology, natural science, etc. Time series processing is regarded as a very important and valuable technology. At present, it has obtained many successful applications in many critical domains, such as sensor network monitoring, financial data analysis, DNA sequence analysis, moving objects tracking and motion capture, etc. However, time series is a typical high dimension and large mass data. Therefore, there still exists a big challenge of time series processing.Around the segmenting and query optimization of time series, we carry out research in five areas:the related techniques of time series segmenting, the query optimization of static segmented time series based on clustering, reverse nearest neighbor (RNN) query based on partition for static time series segmented, special pattern matching for dynamic segmented time series, and correlation query based on grid technique for dynamic time series segmented.Considering the important significance of segmenting towards time series processing and the lack of non-equal length segmenting method for dynamic time series, we study equal and no-equal length segmenting technologies of static and dynamic time series. An adaptive segmentation algorithm over multiple data streams, called QONSP, is proposed, using the incremental computation of PAA (Piecewise Aggregate Approximation) and PLA (Piecewise Linear Approximation). It only has linear time complexity. The experimental results show that, QONSP can segment up to one thousands of dynamic time series in real time, and control the accuracy and length of segmentation by adjusting corresponding to parameters.In order to improve the query efficiency of static time series after segmenting, we study the impact of clustering on time series range queries. A symmetric low bounding function based on the equal length segmentation, called SLBS, is introduced in the paper. Moreover, we also prove that the distance based on SLBS between time series is smaller than Euclidean distance. Then, taking advantage of SLBS and clustering, we give an algorithm of range query for static time series, namely RQIC, which integrates three techniques: first-k filtering, the triangular inequality pruning, and lower bounding filtering. The experimental results show that, RQIC can prune at least 60% objects in most instances, and has an approximate query performance of PLA.Owing to the lack of the query optimization on static segmented time series using B+-tree, we study RNN query. For any two different time series, they may be near objects if their whole and local trends keep similar. Using the universal characteristic, all time series can be partitioned, and indexed into a B+-tree. Finally, an approach called RiDistance which is inspired by iDistance is proposed, using the filtering-refine framework. The experimental results show that, RiDistance obtains 1-2 orders of magnitude performance improvement relative to sequential scan algorithm.In order to overcome the problem of that existing function of pattern matching cannot automatically adapt the change of length and amplitude, we study how to optimize the special pattern query of dynamic time series. A novel function of patterns similarity is introduced and we prove that it is a metric function. Thus, we can accelerate the query computation by triangular inequality. Meanwhile, we also proposed an efficient pattern matching algorithm and a probability prediction approach to predict the possible emerging patterns based on the statistatic information. The experimental results show that, our methods can monitor most special patterns among them and its pattern similarity function can adapt the change of amplitude offset and scaling better than other famous functions.Furthermore, towards the drawbacks of subsequence correlation query and the lack of motif discovery algorithms of dynamic time series, we study the optimization method using grid structure. A distance function SDD which can automatically adapt the change of length and amplitude and is proved satifying the properties of metric function, is introduced in the paper. Using previous QONSP and SDD, we proposed a model of correlation monitoring MCALP, based on the projection of grid. The model can monitor the minimum correlation (or crossed correlation) and maximum correlation (or motif correlation). MCALP gives two methods, namely, MCPDG (Minimal Correlative Pattern Discovery based on Grid) and PMDGS (P-Motif Discovery based on Grid Structure), by proving two importance performance theorems. Experimental results using finance data streams show that, our approaches are effective and only have a linear time and space complexity.
Keywords/Search Tags:data mining, query optimization, clustering, correlation, motif discovery, reverse nearest neighbor, segmented time series
PDF Full Text Request
Related items