Font Size: a A A

Extensible Markov model: An efficient data mining framework for spatiotemporal stream data

Posted on:2008-05-04Degree:Ph.DType:Dissertation
University:Southern Methodist UniversityCandidate:Meng, YuFull Text:PDF
GTID:1458390005480167Subject:Computer Science
Abstract/Summary:
Today the volume of data has been exponentially increasing as a result of advances in data generation, collection and storage technologies. A special form of data is the data stream, which is continuously and rapidly generated over time. Data streams have become ubiquitous in the computing environment and need prompt processing in many applications. Data stream mining has become a demanding area of research.; Although data mining has been a fairly well researched area, many conventional data mining methods may not work effectively on data stream mining. In this research, we present a dynamic, incremental, adaptive and efficient framework, Extensible Markov Model (EMM), for mining of spatiotemporal data stream. EMM aims at online mining of data streams which are spatiotemporal in dimension, large in size, evolving in time, heterogeneous in format and distributed in sources. We primarily concentrate on the following aspects: (1) capacity of modeling spatiotemporal data streams, (2) capacity of local pattern finding, (3) identification of developing trends, and (4) interaction with user's feedback.; Modeling is indispensable in data stream mining. Previous work at SMU has proposed a variation of the Markov Chain, called EMM, for modeling spatiotemporal data. In this work we build on the prior work by formalizing the EMM for mining of spatiotemporal, time evolving data streams. EMM uses an open number of states of Markov Chain to accommodate the dynamic temporality of data streams. EMM groups similar spatiotemporal data events into clusters and maps each of the clusters to a state of Markov Chain. The efficiency is ensured by the incrementality of the modeling processes. By adding the Markov property as a global restriction, the granular size of the clusters is determined for optimal performance. The global modeling result is presented by a synopsis, which provides a base of data stream mining tasks. Performance of prediction experiments demonstrates effectiveness of modeling capacity of the EMM. A novel method is presented for updating the synopsis to reflect current behavior and detecting the developing trends of time evolving data streams.; The local pattern finding capability of EMM is explored by a number of applications. We first examine its application to anomaly detection. Based on a set of predefined concepts and rules, EMM demonstrates its high detection rate on anomalies. As the second application, a sophisticated mining task on the synopsis is investigated to detect DDOS network intrusion. Furthermore, we explore efficient inclusions of user's feedback into mining tasks of EMM in order to promote the performance.; At last, we propose the directions of future work. We present a scheme to adaptively adjust the granular size of the clusters so as to optimize the modeling. We also briefly investigate a relational hierarchy for using EMM in distributed environment.
Keywords/Search Tags:Data, EMM, Mining, Spatiotemporal, Stream, Markov, Work, Modeling
Related items