Font Size: a A A

The Research Of RDF Semantic Data's Encoding For Storage And Query Optimization

Posted on:2016-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2348330488474107Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the deepening research of the semantic web and the improvement of informa t io n extraction technologies, the number of the RDF semantic grows rapidly in recent years, and more and more complicated semantic data need to be processed. There are hundreds of millions of triples in common RDF data set, how to efficiently storing and querying on mass RDF semantic data has become a hot topic in academia. RDF has a huge amount of data, the existing RDF storage system adopts the way of coding can greatly compress the data, but most of the coding method does not support incremental updating operation of data. Existing RDF storage systems have more complete triple storage method and higher recall ratio, but it remains to be improved in the query efficiency. As an important area for the future development, it is very worthy of studying on RDF semantic data processing.This paper mainly studied the RDF data encoding and storage technology. Through the analysis of research status and the research and experimentation on existing system, proposing an improvement method for RDF semantic data encoding and storage. Specific research has the following several aspects:Firstly, according to the characteristics of RDF semantic data, this paper designed and realized the HBRA coding scheme based on hash. HBRA encoding scheme used hash lookup algorithm to realize the mapping from the text to number. This method supported initial bulk loading data and fast incremental updating, meeting the characteristics of the RDF data's continuous increase. Through establishing the conflict tables in memory and using bloom filter search algorithm, improving the efficiency of data searching and data updating.Secondly, through the research of RDF query processing technology, the paper proposed a storage method using the secondary indexes method based on triples, and designed the storage structure of secondary indexes, and implemented the corresponding stored procedure. Using secondary index structure can not only reduce the RDF data storage space, but also decrease the number of comparison for RDF data semantic query. By studying the overall structure of the query engine and query mechanism, combining with the characteristics of secondary indexes, this paper designed the corresponded processing of RDF semantic data query.Thirdly, according to the proposed RDF semantic data coding scheme and the query strategies, improved the RDF-3X system and did the related experiments. In terms of encoding, experiments on data coding scheme were processed. In terms of query performance, the paper selected nine SPARQL query sentences, and experimented on several storage system using YAGO dataset, finally got the recall ratio and the query time. Experimental results showed that using the new coding scheme and storage technologies could get the same recall ratio, and have a better on the basis efficiency on simple query and parallel query.
Keywords/Search Tags:Semantic Web, Data Encode, Query Optimization, RDF
PDF Full Text Request
Related items