Font Size: a A A

Research On Amnesic Synopsis Of Data Stream And Its Applications

Posted on:2009-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:H H ChenFull Text:PDF
GTID:1118360272489272Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the increasingly widespread use of computer networks and electronic equipments, many real-life applications data appeared as dynamic evolving data streams which are continuous and unbounded in nature. Such applications include stock markets, network traffic monitoring, sensor networks, internet, network security, data mining, financial monitoring, and many more. Traditional data base techniques can hardly be applied to process such high-speed and unbounded data stream directly. So researches need to work out novel processing techniques over data streams.Maintaining a synopsis data structure dynamically from data stream is vital for a variety of streaming data applications, such as approximate query or data mining. In many cases, the significance of data item in streams decay with age: this item perhaps conveys critical information first, but, as time goes by, it gets less and less important until it eventually becomes useless. This feature is termed amnesic. The dissertation proposed a Hierarchical Amnesic Synopses (HAS) which includes the amnesic feature of data stream in the generation of synopses. HAS can provide a better approximate representation for data streams with amnesic feature than conventional data stream synopses.Our major contributions in the dissertation are as follows.1. The dissertation proposed HAS structure to utilize the amnesic feature of data stream. HAS structure has the ability to control the amnesic speed and the reconstruction error.2. Discrete Wavelet Transform is often used in construction of synopses for streaming data. We proposed a Wavelet-based Hierarchical Amnesic Synopses (W-HAS). To maintain W-HAS online for evolving data streams, the paper first explored the merger process of two wavelet decompositions, and then implemented the addition of data nodes in W-HAS structure based on the merger process. Using the addition of data nodes, W-HAS grows dynamically and hierarchically. The construction methods of W-HAS under sum of squared error (sse) and maximum absolute error metrics are discussed. Further, W-HAS with error control is also explored.3. We proposed Weighted Sampling based Hierarchical Amnesic Synopses (WS-HAS). The construction method of WS-HAS for weighted random sampling with and without replacement is discussed.4. We proposed Histograms Based Hierarchical Amnesic Synopses (H-HAS). The construction methods of H-HAS using Equi-width and V-optimal histograms are discussed.5. Clustering is useful in analyzing the paralleled data streams. Using the W-HAS and the RP-HAS (Random Projections based HAS), respectively, a fast computation of approximate distances between streams and the cluster center can be implemented, and an efficient online version of the classical K-means clustering algorithm is developed.6. We proposed HD-HAS (High Dimensional HAS).7. We introduced a novel sketch, EFM sketch. EFM sketch can be used to estimate the similarity of two sets. Based on HAS structure, we discussed the analyzing method of evolvement in data stream.
Keywords/Search Tags:data streams, amnesic, Synopses, Wavelets, clustering
PDF Full Text Request
Related items