Font Size: a A A

Research On Large-scale RDF Data Query Algorithm Based On Graph

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiuFull Text:PDF
GTID:2428330647961932Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Resource Description Framework(RDF)can express rich semantic information and is widely used in the metadata description of the knowledge graph.With the development of semantic web information extraction technology,a single RDF dataset has reached the scale of billions of triples.The SPARQL query language has been recommended by the World Wide Web Consortium(W3C)as the standard language for querying RDF data.The RDF query problem based on SPARQL can be transformed into a subgraph homomorphism problem,which is NP-complete.In addition,for SPARQL queries with noise,how to return the Top-k results within a reasonable time becomes the most concerned problem of users.Therefore,how to efficiently execute SPARQL queries on large-scale RDF data is a challenging problem in knowledge graph data management.The main work of this thesis is as follows:1)For the SPARQL queries without noise,a query graph node matching strategy based on constraint size is proposed.On this basis,a tree search based RDF query algorithm RI-triples is given.This strategy introduce as many constraints and as early as possible in the matching phase to prune unmatched branches,avoid applying any complicated predictive pruning rules,and effectively reduce the search space and improve query efficiency.For the LUBM6 M,LUBM13M and LUBM33 M datasets,the total query time of RI-Triples are 0.59,0.54 and 0.74 times those of g Store,and 0.24,0.27 and 0.34 times those of RDF-3X,respectively.For the snowflake queries in the Wat Div10 dataset,the total query times of RI-triples is 0.028 times of g Store and 0.24 times of RDF-3X,respectively.Experimental results show that RI-triples has good query performance.2)For the complex SPARQL queries without noise and SPARQL queries with noise,a RDF Top-k query algorithm NBRQ based on neighbor-aware is given,which uses a statistical significance method to evaluate the similarity of the structure and the neighbor nodes labels of the query graph node and its candidates.In order to improve the similarity of query results,the h-hop neighbor nodes of candidate nodes are considered in the matching phase.The experiment uses the LUBM5 M dataset.For the complex SPARQL queries without noise,when the variable nodes ration are 5%?27%,the F1-score range of the NBRQ algorithm are 95.90%?82.40%;For the SPARQL queries with noise,when the noise ratio are 14%?33%,the F1-score range of the NBRQ algorithm are 95.70%?90.30%.Experimental results show that NBRQ algorithm is robust to SPARQL queries with noise,and can effectively deal with the approximate query of large-scale RDF data.
Keywords/Search Tags:RDF query, SPARQL, subgraph homomorphism, approximate subgraph matching
PDF Full Text Request
Related items