Font Size: a A A

Study On Correlation Analysis In Time Series Data Streams

Posted on:2009-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:D J YueFull Text:PDF
GTID:2178360308479732Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the development of database theories, many new database technologies have been introduced. Recently data streams have received considerable attention in various communities due to several important applications, such as sensor network monitoring, network analysis, financial data analysis, and scientific data processing. Data streams are continuous, time-varying, unbounded sequences of data items, implying that online stream algorithms are restricted to only one pass over the data. How to process the rapid stream data with limited space is a very challenging job for data mining research and applications.As the foundation of lots of mining task such as cluster, classification, frequent pattern discovery and novelty detection, similarity query has become the key problem in data mining area. In this paper, we adopt correlation analysis as the distance measurement and propose a series of algorithms for fast correlation analysis in multiple time series data streams.Our main contributions are as follows:(1) A new data reduction technique based on Boolean representation is proposed. As original sequences are transformed into Boolean series, we can get the correlation results effectively by bit operations among Boolean series. This presentation is the simplest approach as the best of our knowledge which use very little memory.(2) A hierarchical Boolean representation (HBR) algorithm is introduced for synchronization analysis among multiple time series data streams with the fixed sliding window size. This method transforms the original stream series into Macro-Boolean series and Micro-Boolean series respectively. Then we can get the candidate correlation set by effctive bit operations.(3) A novel lag correlation analysis (BLC) algorithm is shown based on synchronization analysis. By the same way, we can get the lag time efficiently by simple Boolean lag correlation technique.(4) A self-adaptive correlation analysis (SACA) algorithm is given when users have no prior knowledge about the sliding window size. By means of the Boolean auto-correlation coefficient, we can get the periodical trend of each stream series. Then we group all the series together with similar periodic. In this way, we can adjust the window size in the same group adaptively.According to the theoretical analysis and experimental evaluations, we can prove that our algorithms have great efficiency.
Keywords/Search Tags:time series data stream, correlation analysis, Boolean representation, lag correlation, self-adaptivity
PDF Full Text Request
Related items