Font Size: a A A

Research On Classification Based Graph Indexes

Posted on:2016-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:J L BaiFull Text:PDF
GTID:2308330503477885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As graph structure is widely used for modeling complex data, the technology of graph database is developing rapidly.How to retrieve data from graph databasequickly has become a research focus. Subgraph search and similarity search from graph database are two important kinds of graph search.Subgraph search means to retrieve the graph set of which each element contains the query subgraph. While the similarity search means to return all the graph that are similar to the query graph under a threshold.To handle these two problems, subgraph isomorphism validation and similarity computationsare necessary and they have been proved to be NP hard. Currently, the"classification filter+validation"two-phase mechanism is the main approach to improving the efficiency of the work.In this mechanism, some classification filter rules are put forward to classify the graph database, and then the features of the queryare extracted and used to find the possible result class, thus generating a smaller candidates set.Finally, the subgraph isomorphism of candidatesis verified or graph similarity is calculated to get the final result set for the query. This way, researchers may construct proper index structures according to the rules of classification filter to improve the efficiency of the query.Current index methods normally build indexes offline without considering the time series relationship between thequeries, which makesquite some of redundant querries.While current methods for similarity search always focus on how to reduce the calculation time without devoting to make the size of candidates set smaller,so the methodstill needs to scan the whole database necessarily.To solve the above problem, this paper proposed a double index method.Firstly, it makes an index named DIndex (Database Index) based on the graph database offline. Then it builds an index named HIndex (History Index) while dealing with the query. Some frequent queries are stored in the HIndex.So when these queries are searched again,we can first search the HIndex, if the target is found, we can get the result set directly without subgraph isomorphism; if not, we can search the DIndex in traditional ways to get a candidates set.In order to reduce the size of the candidatesset insimilarity search, this paper proposes a multi-layer filtering method and puts forward a proper index structure for different filter layer respectively.Firstly,itextractsthe size and vertex label informationof the query graphand then comparesthem with graphs in the database.So itcouldremove the graphs with high dissimilarity.Then ituses the xor vector multiplication of subgraph vector to get a sorted set of which the most possible results appear in the front.What’s more,it gets a proper subscript of the sorted set by calculating the mapping distance like binary search, making sure that the most possible results will appear before the subscript in the sorted set, so we can directly add the graphs withsmaller subscriptthan the subscript into the candidates set.We can also insert some graphs withsubscript lager than the subscript into the candidates set to improve the recall ratio. At last, by calculating the graph edit distance (GED) betweengraphs in the candidates set and the query,we get the results.Finally, the results of our experiments have verified the efficiency of the double index methodin the subgraph search and the multi-layer filtering method in the similarity search.In theexperiment of subgraph search, weuse different densityquery sequence to getthe query time of double index method and the traditional single index method and noindex method under different conditions. By analyzing of the experimental results, we find that with the increase of density of the query, the number of subgraph isomorphism validation by double index method is obviously lower than the other two methods, so this method can improve efficiency of the subgraph search.In the similarity search experiments, wecompare multi-layer filtering method with existing Closure-Tree method on the four aspect of index setup time, indexsize, searchtime and recall ratio.The resultsof experiments show that the multi-layer filter method in setup time, the size of the index, and query time is better thanClosure-Tree method, while it can keep the recall ratio around 95%.
Keywords/Search Tags:Graph database, Classification filter, Graph index, Subgraph search, Similarity search
PDF Full Text Request
Related items