Font Size: a A A

Skyline Query Research For Massive RDF Data Under Distributed Computing Environments

Posted on:2016-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2308330461950875Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the introduction of the Semantic Web and development and maturation of related technology, the field has attracted wide attention in academia, government and industry and active participation. Linked Data, as the best practice of the Semantic Web, has increasingly become a hot topic. Resource Description Framework (RDF) has become the de facto standard for Linked Data, which provide a resource description standard for all network resources. A computer can understand and deal with the meaning of those document that increase some formal semantic information, which greatly improving the accuracy and efficiency of information retrieval. With the growing number of RDF data, data mining and data management for RDF has become a research hotspot. As a typical multi-objective optimization queries, Skyline query can provide a reference in the user decision-making, so it has been widely studied.In this thesis, we study some Skyline queries arithmetics on massive RDF data. In this paper, we will propose a new optimizing query method for RDF data. This method divides query process into two parts. Firstly, in order to save the time cost of checking dominators in Skyline, we have designed a filter strategy for candidate Skyline point according to the characters of RDF storage mode, which can prune some points that not belong to Skyline in advance. Secondly, we can provide Skyline parallel query arithmetic on the basis of MapReduce frame work. Second, we study K-dominated Skyline query algorithm on situation of high-dimensional data. In this paper, we propose two algorithms about K-dominated query. One algorithm is based on dominant capacity to divide the data block, and calculate the partial K-dominated points, then merger the partial results to caculate the final K-dominated points. Another is based on division of space query algorithm, by mining the relationship between data space and K-dominate to calculated dominating set.Finally, this paper will conduct experiments on large amounts of data to verify its efficiency. Experiments show that, compared with the existing Skyline Skyline query algorithm and K-dominate query algorithm, the proposed Skyline query algorithm and K-dominated Skyline algorithm based on MapReduce have effectively improve query efficiency.
Keywords/Search Tags:Skyline queries, K-dominated Skyline queries, RDF data, MapReduce framework, cloud computing
PDF Full Text Request
Related items