Mining, indexing and similarity search in large graph data sets

Posted on:2007-05-23

Degree:Ph.D

Type:Dissertation

University:University of Illinois at Urbana-Champaign

Candidate:Yan, Xifeng

Full Text:PDF

GTID:1448390005464726

Subject:Computer Science

Abstract/Summary:

Scalable analytical algorithms and tools for large graph data sets are in great demand across domains from software engineering to computational biology as it is very difficult, if not impossible, for human beings to manually analyze any reasonably large collection of graphs due to their high complexity. In this dissertation, we investigate two long standing fundamental problems: Given a graph data set, what are the hidden structural patterns and how can we find them? and how can we index graphs and perform similarity search in large graph data sets? Graph pattern mining is an expensive computational problem since subgraph isomorphism is NP-complete. Previous solutions generate inevitable overheads since they rely on joining two graphs to form larger candidates. We develop a graph canonical labeling system, gSpan, showing both theoretically and empirically that this kind of join operation is unnecessary. Graph indexing, the second problem addressed in this dissertation, may incur an exponential number of index entries if all of the substructures in a graph database are used for indexing. The solution, gIndex, proposes a novel, frequent and discriminative graph mining approach that leads to the development of a compact but effective graph index structure that is orders of magnitude smaller in size but an order of magnitude faster in performance than traditional approaches.; Besides graph mining and search, this dissertation provides thorough investigation of pattern summarization, pattern-based classification, constraint pattern mining, and graph similarity searching, which could leverage the usage of graph patterns. It also explores several critical applications in bioinformatics, computer systems and software engineering, including gene relevance network analysis for functional annotation, and program flow analysis for automated software bug isolation.; The developed concepts, theories, and systems may significantly deepen the understanding of data mining principles in structural pattern discovery, interpretation and search. The formulation of a general graph information system through this study could provide fundamental supports to graph-intensive applications in multiple domains.

Keywords/Search Tags:

Graph, Mining, Search, Indexing, Similarity

Related items

1	Mining, indexing, and search approaches to entity and graph information retrieval for chemoinformatics
2	Mining and indexing graph databases
3	Search and indexing of high-dimensional feature spaces for similarity retrieval
4	Research On Graph Query Based On Non-mining Graph Index
5	Research On Graph Similarity Search Based On Frequent Sub-patterns
6	Research On Graph Similarity Search On Uncertain Graph Databases
7	Representing And Indexing Trajectories For Efficient Similarity Search With TRI Framework
8	Research On IIoT Time Series Similarity Search Technologies
9	Similarity Graph-based Scientific Literature Search Key Technology Research
10	Study On Match Similarity Search