Font Size: a A A

Research And Implementation Of Large Collections Of RDF Data Storage And Retrieval Technology On HBase

Posted on:2018-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2348330518998942Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of Semantic Web and the improvement of information extraction technologies,the generated speed of RDF data is also getting faster.The number of triples included in the common RDF dataset has reached hundreds of millions in level.In the face of such RDF data,how to efficiently store and retrieve these data becomes an urgent problem.The traditional centralized management method is obviously not enough to meet the rapid growth of RDF data.The existing distributed management programs have some problems such as using too much storage space for query efficiency.Besides,it cannot meet the reasoning query and so on.So how to efficiently manage RDF data is worth the effort.Aiming at the shortcomings of RDF data storage and retrieval,this paper has designed and implemented a scheme to store the RDF data and ontology data based on HBase.Then we design and implement the SPARQL statement query method for the storage scheme proposed in this paper.Finally,we have verified the validity and correctness of the proposed scheme through detailed experimental process.The specific research contents are as follows:(1)In order to reduce the storage space effectively,this paper proposed to use a scheme of MMH coding for RDF data.After analyzing the RDF data we found that many strings are repeated.So after comparing several Hash algorithm,this paper chose Murmur Hash algorithm to encode the string in RDF data,which can effectively reduce the storage space after encoding.(2)Designed and implemented a storage scheme of RDF data based on HBase.First,the relationship between the class and the attribute in the ontology file is parsed and stored in the corresponding table of HBase.The meaning lies in preserving the implicit relationship between the RDF data and ensuring the integrity of the stored data.Combined with the characteristics of RDF data storage and query,the scheme of RDF data storage with two tables is designed in the design process,and the efficiency of RDF data query is guaranteed with less space.(3)Designed and implemented the SPARQL parser and the query algorithms for the storage model of this paper.Implemented the SPARQL parser to preprocess the query statement and completed the reasoning process to ensure the integrity of the data.Designed and implemented eight different forms of Triple Pattern query algorithm and BGP reasoning query algorithm based on greedy strategy.The intermediate result set with the less cost is selected first to reduce the Spark connection operation time and improve the query efficiency.(4)Verified the validity and correctness of the storage and query scheme proposed in this paper and compared with other programs.According to the RDF data storage and query scheme proposed in this paper,six different sizes of LUBM datasets and eight different SPARQL query statements are used to test the data storage space,load time and query time in detail.And compared with other schemes storing RDF data,and finally drew the effective and correct conclusion of the scheme proposed in this paper.
Keywords/Search Tags:Semantic web, RDF storage, Reasoning query, HBase
PDF Full Text Request
Related items