Font Size: a A A

The Research And Implementation Of Subgraph Similarity Matchings Algorithms In Large Graphs

Posted on:2016-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2370330542989421Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an basic data structure in computer science,graph are often used to describe a number of complex datas,which exist various relations,such as social network,protein-protein interaction network,chmical molecular structure,road network and so on.Subgraph exact matching is a kind of basic operation of graph database management,the definition of which is to identify all occurrences of a user-given query graph in a data graph.At present,there have existed a large number of reseach achievements about subgraph exact matching algorithms.However,with the rapid development of Internet technology,the data quantity of all kinds of areas have a explosive growth,the size of graph data is becoming more and more big,so that the large database graph exists inevitably noise data,and subgraph exact matching often can't satisfy the fuzzy inquiry by users.Base on above factors,as a extension of subgraph exaxt matching,subgraph similarity matching is of great significance.The definition of subgraph similarity matching is given as follows.Given a query garph q,a data graph G,a threshold ?,for each subgraph p missing at most ? from the query graph,the goal is to find out all occurrences of the subgrpah p in the data graph.The basic problem of the subgraph exact matching and the subgraph similarity matching is still the problem of subgraph isomorphism,which has been proved NP complete problem.In order to improve the efficiency of search,this thesis takes consideration of constructing index and making use of colud computing respectively,and puts forward two kinds of subgraph similarity matching approaches,that is index-based subgraph similarity matching and cloud-based subgraph similairity matching respectively.In the index-based subgraph similarity matching approach,this thesis mainly utilizes the hybrid neighborhood unit to construct the index for each vertex of the data graph,and takes advantage of the character,that is multiple query subgraphs can share a certain spanning tree,to utilize the great pruning power for the purpose of improving search efficiency.In addition,for all of subgraphs generated by the query graph,this thesis puts forward a depth-first enumeration order,which has a similar property as the frequent pattern mining,so that it can prune unmatched subgrpahs.In the cloud-based subgraph similarity matching approach,this thesis mainly take four-step to achieve complete the search algorithm.Firstly,relaxing the query grapph into subgraphs,each of which conforms to ? constraint.Secondly,decomposing these subgraphs into smaller query structures,that is h trees.Thirdly,achieving parallel exact matching for these h trees in the machines.Finally,achieving parallel joining operation to obtain the final matching results.Finally,this thesis will do large number of experiments for index-based subgraph similarity matching approach and cloud-based subgraph similarity matching approach to do comparison and analysis.Real and synthetic data sets are employed to demonstrate the efficiency and scalability in perfoming subgrpah similarity matching over large graph data.
Keywords/Search Tags:large graph, subgraph similarity matching, unit index, cloud environment
PDF Full Text Request
Related items