Font Size: a A A

Algorithms For Data Streams Based On Shielding/Summarizing

Posted on:2007-11-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H ChongFull Text:PDF
GTID:1118360212984447Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently emerging data-intensive applications usually generate so called data streams in uncontrollable properties, coming order or interval for example while algorithms may consume almost infinite memory with respect to limited space available. Therefore, an algorithm for data streams is constrained to the following requirements: 1) it must run in sublinear space while its output may be approximate; 2) it can process inputs in an online way. Either shielding parts of data streams or summarizing data streams can be among alternative strategies for data streams processing when it can consume no more than sublinear space. In this thesis, we study several problems of data streams, including mining frequent item(set)s, estimating aggregation functions in distributed environments and searching k-median with the following contributions:1. Based on online shielding parts of data streams, we propose false negative algorithms for mining frequent item(set)s or maximal frequent itemsets. Using O(s-1 ln(2δ-1) memory, our algorithm can output frequent items with probability of at least 1-δ while capturing maximal frequent itemsets at the cost of O((K/s) ln (s-1δ-1)).2. Based on sampling data streams, we propose algorithms for filtering out redundant and inconsistent data hidden in distributed data streams. Our algorithms can lead to uniform samples and approximate solutions to estimating such aggregate functions as average aggregate function and k-median.3. Based on summarizing data streams, we propose a time-efficient computation of k-median under memory constraints.In addition to data-intensive applications, our researches can be applied to computational geometry, massive graph, machine learning, pattern recognition and etc.
Keywords/Search Tags:data streams, data streams mining, querying data streams
PDF Full Text Request
Related items