Font Size: a A A

Supporting knowledge discovery in data stream management systems

Posted on:2009-01-18Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Thakkar, Hetal MFull Text:PDF
GTID:1448390002499472Subject:Computer Science
Abstract/Summary:
A growing number of applications, including network traffic monitoring and highway congestion analysis, continuously generate massive data streams. Management of these streams presents many new research challenges, which include Quality of Service (QoS) guarantees, window and other synopses. Therefore, many research projects have focused on building Data Stream Management Systems (DSMSs) to address these challenges [ACC03, ABW03, CCD03]. However, all of these systems are limited to simple continuous queries over data streams, i.e., they do not support advanced applications, such as data stream mining. However, such advanced applications are critical in many real-world scenarios, such as web click-stream analysis, market basket data mining, and credit card fraud detection. The importance of data stream mining is further illustrated by research projects focusing on devising fast & light algorithms for online mining [CWY04, JQS03, CZ04, WFY03, EKS98, FOR06, MTZ08]. However, besides devising fast & light algorithms deployment of online data stream mining methods presents many difficult challenges. In particular data stream mining methods must be deployed with all essentials that DSMSs provide for simpler applications, including QoS, load shedding, and synopses. Thus, in this dissertation we extend a DSMS into an online data mining workbench by the following research advances: (1) The power of our DSMS, namely Stream Mill, and its language were extended to support more advanced queries, such as online mining, sequence queries, etc., by extending the query language (namely SQL), (2) A suite of online mining algorithms are integrated into the DSMS, to provide advanced mining techniques, such as ensemble-based methods [WFY03, CZ04, FORO6]), and (3) Data mining models and workflows are introduced to support specification of the complete mining process. This stimulates ease-of-use, since all users can now simply invoke the workflow, as opposed to recreating the flow by himself/herself. The framework also allows experts to add new mining algorithms. We demonstrate that the resulting data stream mining workbench achieves performance and extensibility, which are unmatched, even by static mining workbenches.
Keywords/Search Tags:Data stream, Management, Support, Applications
Related items