Font Size: a A A

Online management and mining of heteregenous and dynamic time-series

Posted on:2009-04-29Degree:Ph.DType:Dissertation
University:The Ohio State UniversityCandidate:Altiparmak, FatihFull Text:PDF
GTID:1448390005459831Subject:Computer Science
Abstract/Summary:
In this PhD dissertation, we propose database solutions for some of the major challenges in mining and managing time-series data. In particular, we propose a framework for mining heterogeneous time-series data, and a framework for online summarization and analysis of dynamic time-series data.; We propose a general framework, Information Mining, to acquire information from heteregenous and potentially high dimensional time-series data. The framework consists of two major steps: first, significant, clean, and homogeneous subsets of data are identified and analyzed using a data mining algorithm, then the information gathered in the first step is further refined by identifying common (or distinct) patterns over the results of mining of the subsets. We extend our approach for a class of mining tasks over microarray and clinical trials time-series applications and show that Information Mining is an effective method for mining these datasets.; In a multiple data stream application, a new element for each data sequence, i.e. time-series, is periodically inserted into the database. The data in multiple streams is usually compressed due to storage limitations, and the data is reconstructed at the time of query. The quality of this reconstruction should be good enough to run general types of queries, i.e. range, and k-nn queries, on it. We present an online technique, PQ-Stream, which provides a high quality reconstruction. We showed that PQ-Stream significantly outperforms the current techniques for a wide variety of query types on both synthetic and real data sets. The interest of the queries is not uniformly distributed over the all time units; most queries involve the newest few time units. The storage can be assigned to the time units based on their order of the query interest. We propose Ladder Approach to mitigate the stress on storage in multi-stream systems by adding the element of age to the sliding window. The Ladder Approach was shown effective in two real streaming applications, i.e. weather and stock data.
Keywords/Search Tags:Mining, Data, Time-series, Online, Propose
Related items