Research On Large-scale RDF Data Query Method Based On Graph Clustering

Posted on:2015-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Cui

Full Text:PDF

GTID:2348330485493454

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As a simple and strongly extensible data model, Resource Description Framework(RDF) has been considered as the data representation method of World Wide Web and other research fields. The emergence of a large number of RDF data makes RDF query face enormous challenges. At present, there has been many open source business software which is used to store and query RDF data. These software tools are mostly using the traditional relational database management system or a native RDF storage system to store and handle RDF data. However, when doing large-scale RDF data querying, their performance is far from people's satisfaction.Aiming to improve the large-scale RDF data query performance, we present two large-scale RDF data query methods which are based on graph clustering in this paper. Among them, one is designed for a single machine, another is designed for distributed system. They take good advantage of the existing best performing graph clustering algorithm which can tackle with very large scale graph to partition the large-scale RDF dataset. After the partition, there should be a great deal of edges within each cluster and relatively few between the clusters. Then the single machine algorithm filters the partitions according to the RDF query request, ignoring the irrelevant ones, and execute RDF query on the filtering result. As a result, the query efficiency can be largely improved. The RDF query algorithm which is based on a distributed system divides the partitioned clusters into several groups and stores these groups on the computing nodes respectively. By using a dispatcher, it makes these nodes execute the complete SPARQL query and put the query results returned by every node together and send the results to clients.We successfully implement these algorithms and evaluate their performance by applying it to representative large-scale RDF dataset YAGO2. The implementing results indicate that compared with the method of simply using the most efficient RDF query engine, our proposed method can achieve a high recall ratio and precision ratio and greatly improve the query efficiency.

Keywords/Search Tags:

Large-scale RDF data, graph clustering, SPARQL query, distributed system

PDF Full Text Request

Related items

1	Research On Large-scale RDF Data Query Algorithm Based On Graph
2	Research On Key Techniques Of Query Processing Over Large-scale Graph Data
3	An Analytical System For Large Scale Semantic Data
4	Research On Subgraph Query Method For Large-scale Dynamic And Directed Label Graph
5	Distributed Query System For Large Scale Knowledge Graph
6	Research On Query And Retrieval Techniques On Distributed Knowledge Graph
7	Unified SPARQL Query-analyzing Language Design And Theoretical Research
8	Research Of Approximate Query Processing Technology For Large Scale Data
9	Constraint Top-k Query For Large-scale Dynamic Graph Based On Frequent Subgraph
10	Research And Application Of Clustering Algorithms For Large Scale Data Sets