Font Size: a A A

Research On Subgraph Query Of RDF Graph Data In Distributed Environment

Posted on:2022-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q R HuangFull Text:PDF
GTID:2518306344472144Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the gradual development of the network,RDF data has become a member of the massive data,and the structure of RDF graphs has become more and more complex as the scale of data increases.Therefore,in such an environment,how to perform more effective RDF graph query is a hot topic of continuous research.The traditional methods of graph query and graph traversal produce great redundancy of intermediate results,and processing subgraph collection queries in stand-alone mode cannot perform efficient matching when the amount of data is extremely large.Moreover,when processing subgraph collection queries,it is necessary to iterate the query graph for multiple times in the query of the common subgraph,and the execution efficiency is not high.In response to the above problems,the paper studies the current mainstream RDF subgraph query methods,and proposes a query method suitable for the subgraph of the RDF graph with a large amount of data and high complexity,and then a method for distributed query batch processing suitable for RDF subgraph sets is also proposed.The main tasks are as follows:1.Propose a subgraph query method based on the cost of nodes.In order to reduce the generation of the intermediate result set of the query,in the RDF subgraph query processing,the proposed RNSV-SQ algorithm uses a graph structure to decompose RDF subgraphs into stars,then calculates and generates the order of these stars by using a custom node cost model.Querying according to this order can effectively reduce the generation of intermediate result sets,thereby improving the performance of RDF subgraph query,and then an improved Map Reduce-based MR-RNSV-SQ algorithm is used to query.In a distributed environment,each iteration matches a star pattern,which can further improve the efficiency of RDF subgraph query.2.Propose a query method of subgraph collection based on compound association tree.In order to reduce the repeated calculation of the common subgraph and make full use of the candidate result set,in the query processing of the RDF subgraph set,first of all,the improved RDCM clustering algorithm based on the dependence of the RDF graph is used to group the RDF subgraph set,so that the correlation between the triple data of the RDF subgraph in the group is relatively large,and the triple data of the RDF subgraph between the groups is less relevant;secondly,a corresponding composite relationship is established for each group,that is,the composite correlation graph,then,the composite association graph is clipped,and the redundant nodes and edges of the composite association graph are deleted to obtain the composite association tree.Finally,using the composite association tree,a Map Reduce-based RDF subgraph collection query method CAT-FQ is proposed,which can use parallel,in the computing environment,the query result of the RDF subgraph set is obtained by traversing the composite association tree once,and the distributed query batch processing is performed on the RDF subgraph set,and then the overall query efficiency of the RDF subgraph set is improved through the operation of Map Reduce.This paper uses the data sets to design the corresponding experiments to compare RDF subgraph query methods.For the RDF subgraph set query method,the effect of clustering method is verified first,and then the RDF subgraph set query method is compared experimentally.Through the designed experimental analysis,the algorithm proposed in this paper can effectively improve the query efficiency of RDF subgraph and RDF subgraph set.
Keywords/Search Tags:RDF graph, Subgraph query, Distributed environment, Graph query optimization
PDF Full Text Request
Related items