With the rapid development of modern information technology, data structure becomesmore and more complex. As a general data structure, graph can model complex structure dataand the relationship between datum, which makes the graph data management more and moreimportant. It was applicated in many fields, for example, in the medical informatics,bioinformatics, social networks, geographic information systems etc.As an important research direction in the graph data management, graph query hasextensive research significance. It helps people to retrieve valuable information from a largenumber of complex data quickly. As an emerging discipline, more and more researchers areconcerned about the field of data query.Currently, graph data query is basically using the “filter–verification” mechanism. Firstlyfilter In the first, obtain a candidate set of the query.Then make a accurately validation. Thepurpose of filtering is to minimize the times of verification, which is a test of graphisomorphism. As we all know, the graph isomorphism testing has been proved as an NP-hardproblem. How to get a more effective candidate set to reduce the computational complexity andhow to improve the efficiency of the query is a wide range of meaningful research topic.Therefore, in order to improve the efficiency of graph query, researchers use the the indexmechanism in the graph data management system. With the offline indexing we can reduce theonline query execution time and space consumption. This paper focus on how to effectivelybuild the index during graph query filtering phase. We have made the following works:1. Fundamental knowledge of graph query is introduced. It includes graph definition, thegraph classification and the processing technologies of query. These works have laid atheoretical foundation for the future research work.2. This paper will put forward a new indexing algorithm in the use of frequent subgraph tosolve the problem of "conflict" in GIndex algorithm. Firstly, we get the feature set by thefrequent subgraph. Then organize the feature set into the DFS tree, and transform them intoDFS code by serializing the feature graphs. Finally, we use a double hash function to store thecode and build an index. The entire algorithm will solve the problem of conflict in the encodingphase, which makes filtration in filtering phase more efficient3. For graph approximate query problem, this paper will propose a new query algorithmthat using the graph’s topology. We discompose the topology of graph into different size ofitemsets and organize them into a DAG Graph. By finding the maximum common substructurethat is constructed from minimum spanning tree, an expanded spanning tree meeting the condition of the approximate threshold can be achieved. Therefore we can transform the treeinto graph standard code, which will be used in quickly match query in DAG to obtain theapproximate candidate set. The algorithm would improve the efficiency because of enumeratingstrategy which trade space for time.4. Finally, experiments have been carried out. The correctness and efficiency of ouralgorithms have been proved by comparing the result with the existing approaches. |