Finding frequent patterns from graph datasets

Posted on:2006-04-09

Degree:Ph.D

Type:Dissertation

University:University of Minnesota

Candidate:Kuramochi, Michihiro

Full Text:PDF

GTID:1458390008970555

Subject:Computer Science

Abstract/Summary:

Efficient algorithms for finding frequent patterns (both sequential and non-sequential) in very large datasets have been one of the key success stories of data mining research. Nevertheless, as data mining techniques have been increasingly applied to non-traditional domains, there is a need to develop efficient and general-purpose frequent pattern discovery algorithms that are capable of capturing the spatial, topological, geometric, and/or relational nature of the datasets that characterize these domains. In many application domains, there exist datasets that possess inherently structural or relational characteristics, which are suitable for graph-based representations, and can greatly benefit from graph-based data mining algorithms.;In this dissertation we focus on the problem of finding frequently occurring patterns from graph datasets. Various formulations of the frequent subgraph discovery are explored that are suited for different types of graphs and applications with varying complexity-completeness trade-offs. We developed five efficient algorithms, each of which is designed for a different typical problem setting in the form and type of the input graph dataset, and the completeness. The first algorithm is designed for finding frequent subgraphs from a set of labeled undirected graphs, and the second one is for finding frequent subgraphs from a set of labeled undirected geometric graphs. While those two algorithms are aimed at relatively small input graphs, third algorithm specifically deals with a single input graph which is large and sparse. The fourth algorithm is different from the previous three in the sense that it is heuristic and more scalable and it can efficiently operate on an input graph with the higher vertex degrees. The fifth algorithm uses parallelism in order to scale up frequent subgraph discovery by distributing the partitioned search spaces over multiple processors and by dynamic load balancing. The performance and scalability of those algorithms are carefully evaluated and analyzed through extensive experiments using synthetic and real graph datasets from various application domains. The results of the experiments illustrate the efficiency of those algorithms in detail.

Keywords/Search Tags:

Datasets, Finding frequent, Graph, Algorithms, Patterns, Domains

Related items

1	New techniques for efficiently discovering frequent patterns
2	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
3	Research Of Mining Frequent Closed Patterns In Microarray Datasets
4	Mining frequent structural patterns from XML datasets
5	Research And Implementation On Algorithms Of Frequent Subgraph Patterns Mining Upon Uncertain Graph Data
6	The Techniques Research On Frequent Pattern Mining
7	An Efficient Algorithm Of Mining Frequent Subgraph Patterns In Uncertain Graph Database
8	Research On Graph Similarity Search Based On Frequent Sub-patterns
9	Research On Mining And Querying Frequent Patterns Based On Simplified Frequent Pattern Tree
10	Research On Technologies And Algorithms Of Web Logs Mining