Font Size: a A A

Research Of Query Technologies On Graph Data

Posted on:2010-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T LiFull Text:PDF
GTID:1118360302965497Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graph is a general data structure used to describe structural or semi-structural data,such as XML, WWW, Social Network, Chemical Compounds, Protein and Gene Inter-section Network, and so on. While the graph theory gets successfully deployed in variousareas, graph data accumulates rapidly. As a result, the enormous amount of data makesit extremely difficult to analyze and extract meaningful knowledge out of the raw data.Compared with traditional query algorithms, graph data query has its own uniqueness andchallenges: The data structure is too sophisticated to manipulate. Subgraph isomorphismis a proven NP-hard problem. Unfortunately, subgraph isomorphism is an inevitable pro-cess in graph query. The types of graph data are much more than traditional data. Suchchallenges bring opportunities to graph data query technologies.In this paper, after induction and analysis of current research results, we study thefrequent subgraph query, supergraph query, containment query, and intersection subgraphquery problems. The main contributions and innovations of this paper are:I. The current efficient frequent subgraph query algorithms utilize the method ofdepth first search during subgraph expanding. Along with the expanding of subgraph,subgraph isomorphism test implements. Although depth first search successfully avoidsa number of middle results, it does result in more complicated computation. In this pa-per, an efficient frequent subgraph query algorithm is proposed. This algorithm producesfrequent subtrees first, and then it extends these subtrees to subgraphs. In the process offrequent subtree generation, it reduces the time complexity of subtree isomorphism corre-sponding to subgraph isomorphism. On the other hand, the expending stage from subtreeto subgraph utilizes broad first search method to reduce the complexity. The theoreticalanalysis and experimental results show that this query method promotes the query effi-ciency O(n1/2·logn) times against other algorithms while returning correct answers.II. Supergraph query is a filter-verification process. Graph dataset is filtered to forma smaller and more precise candidate answer set to promote efficiency. The principleof filtering method in supergraph query is the inclusion logic - if graph A is a subgraphof graph B, all subgraphs of A are subgraphs of B. Graph query algorithms build indexon frequent features, such as frequent paths, frequent subtrees, and frequent subgraphs. But, when a query graph is given, the first thing of query is formalizing all subgraphs ofquery graph and taking subgraph isomorphism tests between those subgraphs and indexedfeatures. This process is very time consuming. In this paper, a novel supergraph queryalgorithm is proposed to solve this problem. It uses a mapping between vertices and fre-quent features to avoid the enumerating stage while compress the size of candidate answerset to raise the efficiency of the algorithm. Experimental results show that this algorithmis much more efficient than other supergraph query algorithms.III. There is another query type in graph query algorithms which named containmentquery. In containment query, the filtering method is exclusion logic: If graph A containsa feature that graph B does not contain, then A should not be a supergraph of B. Querieson the index which based on exclusion logic test subgraph isomorphism between querygraph and indexed features to get the features not contained by query graph. In this paper,a closed frequent feature based index is proposed to reduce the candidate answer set. Theindex features are organized as depth first search tree which forms during frequent sub-graph query. Through this structure, it can accelerate the performance of the algorithmfurther.IV. Frequent subgraph query, supergraph query and containment query cannot an-swer all queries in graph database. The maximum common subgraph set query, whichcan be transformed into supergraph query or containment query under some condition, isnot well studied. In this paper, an algorithm is proposed to solve this problem. It decom-poses graph database into a vertex induced directed acyclic graph to forming a part of theindex. The other part of the index is edge list along with all nodes in the directed acyclicgraph. Using this index structure, the query algorithm can not only answer maximumcommon subgraph set queries, but also do it in a very efficient way. To the best of ourknowledge, this paper is the first paper regarding maximum common subgraph set queryproblem.
Keywords/Search Tags:graph query, subgraph isomorphism, frequent subgraph, frequent subgraphquery, supergraph query, containment query, intersection subgraph query
PDF Full Text Request
Related items