Font Size: a A A

Managing erratic data streams in the distributed environments

Posted on:2007-12-25Degree:Ph.DType:Dissertation
University:University of California, RiversideCandidate:Zhu, ShanzhongFull Text:PDF
GTID:1448390005965532Subject:Computer Science
Abstract/Summary:
We study how to efficiently manage and process erratic streaming data in distributed environments. Erratic data streams are numerical sequences that change frequently and unpredictably, and arise in many important applications. Examples include stock data and sensor data such as temperature, pressure, and humidity. In many streaming applications, such as real-time financial applications and sensor monitoring systems, user queries must be processed in an on-line fashion to ensure prompt delivery of query results. Therefore, it is very important to design scalable and efficient systems to process queries on such streams.; In this dissertation, we address two important data management techniques, namely caching and aggregation, on erratic streams. First, we propose a novel pull-based caching scheme to maintain stochastic consistency for highly erratic data. Stochastic consistency is a new cache consistency model which ensures the cache-source deviation is within a user-specified bound with a given confidence level. Our approach guarantees stochastic consistency with high fidelity, and allows the server to remain stateless, thus achieving excellent scalability and reliability.; Second, we propose a novel approach to approximating aggregate queries over erratic streams. We performed sefered evaluations of aggregates, while ensuring that errors are within user-specified bounds. Our approach models the behaviour of aggregates as Brownian motions, and adaptively determines the next query evaluation time. This approach significantly reduces computation overhead at the server, and achieves high scalability. We also study the processor allocation issue in such aggregate evaluation system.; Finally, we study how to cache erratic sensor data in the sensor network environment in a power-efficient way. Given user-provided consistency requirements, sensor sources must deliver updates frequently enough to the base station caches. We propose a novel dynamic duty cycling scheme which puts sensors in sleep mode most of the time but awakens them before an update message is about to arrive. We simulate this approach using the ns-2 simulator and show that it outperforms other existing duty cycling schemes, such as GAF, in terms of throughput and power consumption.
Keywords/Search Tags:Data, Erratic, Streams
Related items