Font Size: a A A

Research On Large-scale RDF Data Query Method Based On Graph Clustering

Posted on:2015-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y T CuiFull Text:PDF
GTID:2348330485493454Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a simple and strongly extensible data model, Resource Description Framework(RDF) has been considered as the data representation method of World Wide Web and other research fields. The emergence of a large number of RDF data makes RDF query face enormous challenges. At present, there has been many open source business software which is used to store and query RDF data. These software tools are mostly using the traditional relational database management system or a native RDF storage system to store and handle RDF data. However, when doing large-scale RDF data querying, their performance is far from people's satisfaction.Aiming to improve the large-scale RDF data query performance, we present two large-scale RDF data query methods which are based on graph clustering in this paper. Among them, one is designed for a single machine, another is designed for distributed system. They take good advantage of the existing best performing graph clustering algorithm which can tackle with very large scale graph to partition the large-scale RDF dataset. After the partition, there should be a great deal of edges within each cluster and relatively few between the clusters. Then the single machine algorithm filters the partitions according to the RDF query request, ignoring the irrelevant ones, and execute RDF query on the filtering result. As a result, the query efficiency can be largely improved. The RDF query algorithm which is based on a distributed system divides the partitioned clusters into several groups and stores these groups on the computing nodes respectively. By using a dispatcher, it makes these nodes execute the complete SPARQL query and put the query results returned by every node together and send the results to clients.We successfully implement these algorithms and evaluate their performance by applying it to representative large-scale RDF dataset YAGO2. The implementing results indicate that compared with the method of simply using the most efficient RDF query engine, our proposed method can achieve a high recall ratio and precision ratio and greatly improve the query efficiency.
Keywords/Search Tags:Large-scale RDF data, graph clustering, SPARQL query, distributed system
PDF Full Text Request
Related items