Font Size: a A A

A flexible data mining architecture for monitoring data streams

Posted on:2006-11-27Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Bulut, AhmetFull Text:PDF
GTID:1458390008956689Subject:Computer Science
Abstract/Summary:
Data streams are ubiquitous: performance measurements in business process management, faults and alarms in network traffic management, transactions in retail chains, ATM operations in banks, log records generated by web servers, and sensor network data are some specific examples. In almost all of these applications, the data volume is massive, up to several terabytes. Data volume increases even further with the rapid arrival of new tuples. Traditional DBMS's are ill-equipped for processing of data streams in real time, and do not provide adequate support for handling continuous queries posed over these streams.; This dissertation outlines models and issues towards designing an efficient Data Stream Management System (DSMS) called Stardust. The system can handle a diverse set of continuous queries that fit naturally into the mold of data stream applications. We developed wavelet-based approximation schemes that maintain multiple levels of information over streams of data in order to answer queries efficiently.; In centralized DSMS models, a stream is summarized at a central site, and all user queries are processed at this site. In data and query intensive environments, the central site can become a bottleneck. As a remedy to this problem, we developed adaptive replication algorithms for dissemination of stream summaries computed at a central site to interested clients. We tested the distributed version of the system on a number of testbeds. In the first scenario, Stardust exploits the scalability and load balancing of communication provided by content-based routing schemes for efficient distributed stream processing. In the second scenario, we integrated Stardust into a real-time decision support system for nondestructive health monitoring using a wireless network of sensors. The system trades off accuracy for efficient processing of sensor data in order to save the communication overhead and power-consumption.; Finally, we built an event detection framework for monitoring a set of distributed network elements. The goal is to detect potentially interesting incidents specified by users in terms of a multitude of race conditions across a set of routers while maintaining a low monitoring overhead.
Keywords/Search Tags:Data, Stream, Monitoring, Network
Related items