Font Size: a A A

Research And Implementation Of The Query Processing Algorithms For Web-scale RDF Data

Posted on:2015-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:X D YeFull Text:PDF
GTID:2308330482954491Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, the rare semantic information of the network resources is one of the main limitations of the Internet development. As the organization of the Internet is based on hyperlinks, it only knows how to display the resources rather than recognize the implication of the resources. The RDF (Resource Description Framework), proposed by the W3C, has become the standard Description Framework of the semantic web. With the development of the information extraction technology and the semantic web, a larger number of RDF data appear in the web. Thus, the storage, management and retrieve large RDF data turn out to be a difficult problem that is urgent to be addressed. SPARQL, proposed by the W3C, is the standard query language for RDF data.The challenges of the algorithms for RDF queries are as follows. (1) They can’t answer SPARQL queries with wildcards in a scalable manner. (2) They can’t handle frequent updates in RDF repositories efficiently. (3) They can’t support large data sets. Based on the above three problems, we propose algorithms based on indexes and the distributed environments.In order to solve the above problems, firstly, in Chapter 3, we propose an algorithm based on indexes. (1) We use the graph model, i.e., the adjacent lists, to store RDF data. (2) Based on the RDF structures, we add a label to each entity vertex and class vertex. Then, we develop a novel index, VS*-tree, to efficiently search the label information. The index has a low maintain cost and is easy to be updated. (3) According to the labeled information of the RDF data, we propose a pruning algorithm that can be perfectly embedded into text query algorithms. The pruning algorithm can be applied in not only the general SPARQL queries, but also the SPARQL queries with a wildcard.Secondly, according to the characteristics of the RDF date, we raise:(1) leveraging state-of-the-art single node RDF-store technology. (2) Partitioning the data across nodes in a manner that helps accelerate query processing through locality optimizations. (3) Decomposing SPARQL queries into high performance fragments that take advantage of how data is partitioned in a cluster.At last, extensive experiments confirm the efficiency and effectiveness of our solution...
Keywords/Search Tags:Semantic Web, RDF Data, SPARQL, Distributed
PDF Full Text Request
Related items