Font Size: a A A

Top-k Query And Optimization On RDF Semantic Data

Posted on:2016-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:W L WuFull Text:PDF
GTID:1318330482457970Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The semantic world wide web specifies the information with definite structure and semantics, so that machines can not only display these data but also understand, handle, and organize them. In recent years, with the fast development of the Link Open Data and DBpedia and other projects, semantic web data sources increase significantly and a large number of graph data in form of RDF (Resource Description Framework) are published. The web data is rapidly changing from the "page" full of text to a set with abundant entities and rich relationships between entities. Under this background, the emergence of new data is bound to lead to the development of new techniques for data processing and studying on semantic data, especially the RDF data, has become a frontier and hot subject.Currently, that the amount of RDF data sharply increases and these data are widely usered greatly promote the development of the associated storage, query, and indexing technologies. The existing indexes, retrieval models, algorithms and query optimization technologies enable efficient queries that return top-1 result or top-1 set of exactly matching results for some specific aspects. However, the diverse requirments in RDF query and the inadequate query patterns lead these efficient retrieval methods to face the problems in top-k query that they can't solve. Therefore, studying on top-k query in RDF graphs has become increasingly urgent for covering the diverse requirments of users. In the meanwhile, RDF graphs has many characteristics, such as the text information-rich, strong semantic representation, so many correlative triples, huge amount of data, which bring an enormous challenge for the efficient top-k query.This paper addresses the core issue on query and optimization in RDF graphs and proposes a new cardinality estimation for top-k probabilistic query after analyzing and identifying the advantages and disadvantages of the existing algorithms. At the meanwhile, the unsolved top-k shortest path problem is arised from the top-k probabilistic query. We design a new index and implement the answering framework for top-k shortest path query with considering the characteristics of RDF graph itself, the information reflected from RDFs and its inter-relationship, trying to introduce cardinality estimation method to ameliorate the efficiency of query process. But this estimation method doesn't work well since the difference between path query and selection query. So, this paper uses pruning optimization to solve this efficiency problem. Our pruning methods improve the efficiency of top-k path query, but cannot be able guarantee the quality of results. This framework often returns not enough results, even null. Therefore, we propose the concept of semantic distance, which forms the semantic-based measure, and implement the top-k approximation query for RDF graphs.Specifically, new contributions of this paper are as follows:1. Cardinality estimation algorithm for top-k query on RDF graphs. The previous cardinality estimation methods assume that the RDF graph is determined and the correlations between subqueries are independent. Under these assumptions, the prior arts process the estimation for arbitrary query. Since the assumptions are inconsistent with the actual situation, their estimations are often inaccurate. This paper focuses on this problem and proposes an estimation method based on Bayesian probability model. The experiments on dataset YAGO, LUBM, and DBLP indicate that our method effectively processes the selectivity cardinality estimation in query for probabilistic RDF graphs, and that the estimation results outperform existing approaches in accuracy, and that its performance is acceptable.2. Top-k shortest path query in RDF graphs basing on semantic compression index. This paper proposes and constructs a novel component-based index for path queries by using the semantic and structural information from RDF graphs and RDFs. Based on this index, we specific an approach to answer the top-k shortest path query. To ameliorate the efficiency of query process, we propose optimizing methods, including frequent paths and structural pruning strategies. Finally, we designed control experiments on real dataset YAGO to evaluate our technologies. Results indicate that our approach can efficiently construct the index and effectively return top-k shortest paths. Its response time is much shorter than the prior art and the index takes up not much space. In addition, we test our framework on large graphs and discuss the availability of this framework in a dynamic database.3. Top-k approximating query on RDF graph basing on semantic similarity. The existing works are mainly base on the distance similarity to measure the query and graphs without considering the semantic similarity. For the approximating query on semantic graphs, disregarding the semantic similarity may fail the query. In this paper, we propose a novel ontology-based measure for approximating query on RDF graphs, considering the semantic similarity. In the meanwhile, we specify a semantic structural pruning strategy to ameliorate the efficiency of query process. Finally, we construct the query framework to answer approximating query, and design experiments on dataset LUBM to test our methods under this framework. Results show that the approaches in this paper can efficiently execute approximating query on RDF graphs and effectively return top-k results and avoid the NP-hard problem.
Keywords/Search Tags:Semantic Data, Query, Query Optimization, RDF graph, Top-k Query
PDF Full Text Request
Related items