Font Size: a A A

Key Technologies Research On Burst Detection Over Data Streams

Posted on:2009-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J YuanFull Text:PDF
GTID:1118360278456603Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data stream has two essential characteristics: mutable and unpredictable. As a basic and important method of data analysis, burst detection on data stream can detect the abnormal changes of some specific element's frequency in time. Consequently, it has attracted extensive attentions from the research and industry communities. This technology has been adopted in many applications, including the photon burst detection, stock fluctuations detection, network traffic fluctuations detection and so on. Burst detection on data stream is a new research field and has introduced a lot of challenges and unsolved problems. (1) How to store the elements and its frequency efficiently. Burst detection focuses on the elements changing greater, thus all elements and its frequency must be recorded. The number of these elements is very large and keeps continuously changing. The difference of each element is also notable. How to store the elements and its frequency efficiently while supporting effective querying is definitely a challenging work. (2) How to abate the influence which the total flow incurs to single element in burst detection. In some application scenarios, the total flow of data stream fluctuates acutely and causes the acute fluctuation of the single element's frequency. . Finding an approach to abate the influence which the total flow incurs to single element in burst detection is worth paying more attention to. (3) How to accomplish many monitoring tasks in parallel and real-time with limited space resources within one pass. Burst detection is a typical application of data stream monitoring. At the same time, there also have some other monitoring applications in the monitoring system, such as Top-k querying, frequent mining, change detection, outline detection and so forth. How to accomplish many monitoring tasks in parallel and real-time with limited space resources by one pass is a challenging work.This dissertation puts emphasis on the above challenges of burst detection, and addresses several key problems on burst detection processing. The main contributions of this are dissertation concluded as follows:1. We propose a self-adapting, flexible and scalable Bloom Filter called ExBF to address the dynamic extendible problem of storing elements of data stream. ExBF is constructed based on the extendible hashing. Comparing to existing traditional extendible Bloom Filters such as PBF, SpBF and ScBF, ExBF holds four notable advantages: (1) The time complexity of element updating and querying is O(1) and does not rise with the count of buckets; (2) Extending method is flexible. ExBF does not need to re-distribute all elements while extending data structure. The size of data migration and the impaction on original data structure are both small; (3) The size of ExBF can be reduced dynamically with the element number being reduced; (4) The false positive rate can be controlled to an acceptable threshold. 2. We come up with a novel solution based on Improved Counting Bloom Filter, named as BCBF+HSet, to address the space-efficient problem of storing elements and its frequency in data stream. This solution make use of the relative stable data structure, so it can save about 25% space and abate the acute fluctuation acutely of little elements. On the condition of heavy-tailed distribution of all the elements frequencies, we propose a novel Hierarchical Counting Bloom Filters (HCBF) data structure. This method can reduce about 30% space cost with appropriate parameters.3. We bring forward a Flow Free Burst Detection method called FFBD to address the accurate burst detection problem during the total flow of data stream fluctuates acutely. This method introduces the ratio of one element's frequency and total elements'frequency as the aggregation function value to check burst. Comparing to the traditional aggregation pyramid, FFBD provide the capability of avoiding the influence by total flow at the cost of 2% more space and 5% more computational complexity.4. We also put forward a Grid Dividing based Multiple Monitoring Task Processing Method called GD-MMTPM. This method finds the relationship between outlier, change and burst in data stream, and can process outlier detection, change detection and burst detection synchronously. Different from other methods, GD-MMTPM only need maintain the simple grid information, thus it can reduce time complexity and space complexity significantly.In summary, this dissertation addresses the problem of burst detection in data stream, and proposes several specific algorithms, methods and models. This is a promotion of data processing on both theoretical study and practical applications.
Keywords/Search Tags:data stream, data stream mining, burst detection, Bloom filter, data stream monitoring, efficient storing
PDF Full Text Request
Related items