Font Size: a A A

Research On Highly Scalable RDF Data Storage System

Posted on:2013-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:2248330392956214Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As RDF (Resource Description Framework) data is flexible to be expressed and easyto be interchanged, the volume of RDF data is increasing at an amazing speed. TraditionalRDF storage systems either use RDBMS as storage backend or adopt native storage.However, these methods do not focus on the scalability. There are some systems whichsacrifice storage space to overcome this problem. However, the redundant data and notcompact storage scheme result in the low efficiency in the query plan generation andquery execution periods.TripleBit aims to propose an efficient method in data storage and query processing forlarge scale RDF data in several aspects. Due to the feature of RDF data, it can berepresented as a correlation bit-matrix, and different compression algorithms are appliedon the data tables according to the feature and function of the tables to reduce the storagespace. What’s more, memory based storage layer is employed to reduce the I/Oconsumption. The data tables are partitioned into several chunks which not only facilitatethe buffer management but also make the data more compact therefore it can acceleratethe query processing. In order to speed up the searching of RDF data, two indexes aredesigned which are used to locate the data chunk and get the candidate records ofpredicate-unknown query patterns respectively. When the RDF data is queried, heuristicrules based query plan generator is used which is simple but efficient to generate queryplan. During executing the query plans, different execution strategies are used accordingto the types of query plans, and utilize the parallel subsystem to make the join operatorfaster. Besides, two-stage execution strategy is used in multiple-variable query which canreduce the intermediate result.The performance evaluation is compared with the state of art RDF storage enginenamed RDF-3X. Experimental results demonstrate that TripleBit saves at least40%storage space while the speed of query processing has been improved at least three times.Extensive experiments show the indexes and query plan generator contribute a lot to thehigh performance.
Keywords/Search Tags:Resource Description Framework, Semantic data representation, Queryprocessing, Data compression, Indexing
PDF Full Text Request
Related items