Font Size: a A A

Subgraph Query In Graph Databases

Posted on:2010-07-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZouFull Text:PDF
GTID:1488302753467964Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As an important data structure in computer science, graph can be used to model many complex objects, such as social networks, compound structures and so on. Furthermore, the growing popularity of graph databases has generated interesting data management problems, such as sub-graph search, frequent structural pattern mining and so on. For example, approximate 4000 new compound structures are added into SCI Finder database everyday; there are 150 million active users in Facebook (one of famous social network) Web sites. This dissertation addresses three key issues in graph databases, that are 1) how to design efficient storage and index strategies; 2) how to answer query over graph databases efficiently; 3) how to discovery knowledge from large graph databases.Subgraph query is a fundamental operation in graph databases. Specifically, given a query graph Q, subgraph query reports all data graphs that contains Q. Due to NP-complete hardness of subgraph isomorphism, subgraph query algorithms always adopt "filter-and-refine" framework. In filtering process, according to some necessary condition of subgraph isomorphism, we filter out most false alarms (the data graphs that do not contain query Q) to obtain candidates; Then, we perform subgraph isomorphism algorithm on candidates to fix final results. Existing subgraph query algorithms use "feature" to perform filtering process. However, they ignore the relationships between features. Considering both features and the relationships between them, we design an efficient pruning strategy to speed up query processing. Furthermore, existing subgraph query algorithms cannot handle frequent updates in graph database efficiently. They have to rebuild the index from scratch. Based on spectral graph theory, we design a novel graph coding technique, called GCoding. Based on GCoding, we map graph structural information into numerical space. Then, we build the index on the converted numerical space. This method not only speeds up query processing, but also handle graph database maintenance efficiently.Given a query graph Q, finding all matching positions of Q in a very large graph G is useful in social network analysis, such as finding some specified community in social network and finding a functional group in a biological network. Since we care more about the connectivity in a large graph, the matching definition is different from subgraph isomorphism. According to the shortest path distance, we propose a novel matching definition. To reduce the huge search space, we map vertices in a large graph into points in vector space. Then, we find candidate matchings in the converted vector space. At last, we fix final results in the original large graph.This dissertation also address two graph mining problems, that are frequent structural pattern mining and correlative subgraph mining. We propose an efficient pattern growth algorithm to find frequent structural patterns without expensive verification process. Given a query graph Q, correlation pattern query is to find all correlative patterns with regard to Q. In order to speed up query processing, we design an pattern-growth algorithm to find final results.Considering the "domination" between records in vector space, we build a graph-structure index, called DG-index. Based on DG-index, we convert top-k query into a graph travel problem.
Keywords/Search Tags:Graph Database, Sub-graph Query, Frequent structural Patterns, Top-K Query
PDF Full Text Request
Related items