Font Size: a A A

Finding frequent patterns from graph datasets

Posted on:2006-04-09Degree:Ph.DType:Dissertation
University:University of MinnesotaCandidate:Kuramochi, MichihiroFull Text:PDF
GTID:1458390008970555Subject:Computer Science
Abstract/Summary:
Efficient algorithms for finding frequent patterns (both sequential and non-sequential) in very large datasets have been one of the key success stories of data mining research. Nevertheless, as data mining techniques have been increasingly applied to non-traditional domains, there is a need to develop efficient and general-purpose frequent pattern discovery algorithms that are capable of capturing the spatial, topological, geometric, and/or relational nature of the datasets that characterize these domains. In many application domains, there exist datasets that possess inherently structural or relational characteristics, which are suitable for graph-based representations, and can greatly benefit from graph-based data mining algorithms.;In this dissertation we focus on the problem of finding frequently occurring patterns from graph datasets. Various formulations of the frequent subgraph discovery are explored that are suited for different types of graphs and applications with varying complexity-completeness trade-offs. We developed five efficient algorithms, each of which is designed for a different typical problem setting in the form and type of the input graph dataset, and the completeness. The first algorithm is designed for finding frequent subgraphs from a set of labeled undirected graphs, and the second one is for finding frequent subgraphs from a set of labeled undirected geometric graphs. While those two algorithms are aimed at relatively small input graphs, third algorithm specifically deals with a single input graph which is large and sparse. The fourth algorithm is different from the previous three in the sense that it is heuristic and more scalable and it can efficiently operate on an input graph with the higher vertex degrees. The fifth algorithm uses parallelism in order to scale up frequent subgraph discovery by distributing the partitioned search spaces over multiple processors and by dynamic load balancing. The performance and scalability of those algorithms are carefully evaluated and analyzed through extensive experiments using synthetic and real graph datasets from various application domains. The results of the experiments illustrate the efficiency of those algorithms in detail.
Keywords/Search Tags:Datasets, Finding frequent, Graph, Algorithms, Patterns, Domains
Related items