Query and Mining in Large Graph Databases

Posted on:2014-10-13

Degree:Ph.D

Type:Thesis

University:The Chinese University of Hong Kong (Hong Kong)

Candidate:Zhu, Yuanyuan

Full Text:PDF

GTID:2458390008454796

Subject:Engineering

Abstract/Summary:

Graph has powerful ability to model complex structural relationships among data objects and has been widely used in various applications. Along with the development of the application domains, graph databases become large and are growing rapidly in size. This brings researchers new challenges on graph query and mining, among which we mainly focus on investigating the following three problems: how to find the correspondence between the nodes of two large graphs so that some substructures in one graph are mapped to similar substructures in the other; another problem is how to retrieve similar graphs for a query graph from a graph database consisting of a large number of graphs; and the last problem is how to extract subgraph features to build an automated classification model for a graph database containing graphs which belong to different classes.;In this thesis, for the first problem, we propose a novel two-step approach which can efficiently match two large graphs over thousands of nodes with high matching quality. In the first stage, we design an anchor-selection/expansion scheme to construct a good initial matching heuristically. In the second stage, we propose a new approach to refine the initial matching and give the optimality of our refinement algorithm. Our approach can produce an approximate matching result with high quality and efficiency. To address the second problem, we introduce a new graph distance measure based on the maximum common subgraphs (MCS) of two graphs which can thoroughly capture the common as well as different structures of two graphs. Since computing the MCS of two graphs is NP-complete, to answer the top-k graph similarity query efficiently, we propose a fast algorithm which can significantly reduce the number of MCS computations. This algorithm prunes the unqualified graphs based on three lower bounds in which the first two are derived based on the structures of two graphs and the third is obtained based on the triangle property of the distance measure. Three index schemes are designed with different tradeoffs between pruning power and construction cost to assist the query processing. For the third problem, we identify two main issues of the current widely-used discriminative score for feature selection, and introduce a new diversified discriminative score to explore the additional value of the diversity together with the discriminativity. We analyze the properties of the newly-proposed diversified discriminative score from several perspectives and demonstrate that this score can make positive/negative graphs more separable. New algorithms are also proposed to select features based on the new score and they are shown to have high classification accuracy.

Keywords/Search Tags:

Graph, Large, Query, New, Score

Related items

1	Research On Key Techniques Of Query Processing Over Large-scale Graph Data
2	Constraint Top-k Query For Large-scale Dynamic Graph Based On Frequent Subgraph
3	Research On Subgraph Query Method For Large-scale Dynamic And Directed Label Graph
4	Research Of Graph Query On Large Graph Database
5	Research And Implementation Of Semantic Query And Inference Based On Graph
6	Similarity Top-k Query For Large-scale Dynamic Graph
7	Similarity Nodes Query Processing Approach In The Evolution Process Of Large Dynamic Graph
8	Research On Optimization Of Max-Score Query Processing Technology
9	Research On Key Techniques Of Reachability Query For Large-scale Digraphs
10	Research On Query Processing Technologies Over Large Scale Knowledge Graphs