Font Size: a A A

Relational discovery in sequentially-connected data streams: Efficient algorithms for lossless pattern discovery and change detection

Posted on:2006-11-09Degree:Ph.DType:Dissertation
University:The University of Texas at ArlingtonCandidate:Coble, Jeffrey AllenFull Text:PDF
GTID:1458390008973591Subject:Computer Science
Abstract/Summary:
We are developing relational data mining techniques that discover structural patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event driven, such as counter-terrorism intelligence analysis. Such analytical tasks require discovery of relational patterns between events and actors so that these patterns can be exploited for prediction and action. An additional complexity of these event-driven problems is that they are often continuous, with data streaming in over a long period of time or even indefinitely. This presents the need for discovery algorithms to repeatedly assimilate new data into the discovery process. However, reprocessing the accumulated data after receiving each new increment is often an intractable task because of the computational demands of most relational discovery methods.; We have developed an algorithm to mine relational data streams by summarizing discoveries from previous data increments so that the globally-best patterns can be computed by examining only the new data increment and isolated sets of sequentially-connected data that spans increment boundaries. Our algorithm will find pattern instances that span increment boundaries by using a targeted, localized search, based on the set of globally-best substructures, and an efficient graph exploration technique that restricts the range of the graph that must be explored.; Many continuous problems are also dynamic in nature, requiring discovery algorithms to be capable of recognizing and adapting to change over time. We introduce an algorithm with which we are able to use a measure of central tendency for a set of graphs, used to compute a representative point in graph space. We use these points, along with a distance metric, to measure change in sequential sets of the best patterns discovered from successive data increments. The objective of this work is to enable a method for measuring pattern drift in relational data streams, where the salient patterns may change in prevalence and structure over time. With a measure of central tendency for graph data, along with a method for calculating graph distance, we have a framework with which we can begin to adapt time-series techniques to relational data streams.
Keywords/Search Tags:Relational, Data streams, Discovery, Pattern, Sequentially-connected data, Change, Algorithm
Related items