Font Size: a A A

Optimization And System Implementation For Large Scale Semantic Data Storage And Query

Posted on:2016-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:H J QiuFull Text:PDF
GTID:2428330461456857Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The World Wide Web is widely used in our daily life and becomes the main method to get the information currently.Semantic web is created to allow computers understand the web information.In addition,semantic web makes a semantic support for existing web and extends the current web to enable efficient information sharing and collaborative machine intelligence.On one hand,it is significant for us to store and query semantic data efficiently.More and more applications need us to provide efficient storage structure for semantic data.On the other hand,rapid increase of the semantic data and development of the big data technology provide a new method to solve the problem of storing and querying semantic data.However,traditional ways for semantic data management is to store and query the data in relational database management systems.While the data increase,it can not handle big data well.This thesis optimized the hierarchical storage system of semantic data which stores the persistence data in HBase and stores the hot data in Redis to store and query large-scale semantic data based on the OpenRDF Sesame framework.The RDF storage mechanism is optimized by adopting the attribute table to replace the RDF triple store.What is more,a layer of optimized hash conversion is proposed to avoid wasting time in frequent hash table search during query stage.This thesis mainly consists of following parts.(1)Optimization and implementation of the RDF storage mechanism based on the attribute table.The RDF triple store works with inadequate efficiency,scalability and low rate of storage space utilization to store and query semantic data.This thesis proposes a method to store the semantic data by their attributes.The related semantic data are stored in one table together.The optimal minimal threshold is acquired by using the ASSO algorithm.Considering the big semantic data,a parallel frequent itemset mining algorithm with Spark framework is proposed to generate the index of the attribute table.The experimental results show that,compared to the RDF triple store,the query performance of the proposed storage mechanism is about 0.2 to 1 times of query performance improvement.(2)Optimization of hash conversion during query stage.In the existing implementation of indexing RDF data with hash table,the query engine frequently searches and converts the hash table to acquire the query results,which results in low query efficiency.To solve this problem,a layer of optimized hash conversion is proposed.In this way,we only need to search the hash table at the beginning of query stage and at the ending of return result stage and thus avoid wasting time in frequent hash table search.The experimental results show that the proposed optimization achieves 1 to 7 times speedup and better scalability compared to the original system.(3)Based on above optimization,a prototype large-scale RDF data management system is designed and implemented.The experimental results reveal that the proposed optimizations in this thesis achieve good effects and scalability and can efficiently store and query large-scale RDF data.The overall performance of optimized system achieves 1 to 8 times speedup compared to the original Rainbow System.
Keywords/Search Tags:big data, RDF, semantic web, hierarchical storage, query optimization
PDF Full Text Request
Related items