Font Size: a A A

Research On Distributed Storage Scheme Of RDF Large Graph Data

Posted on:2019-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y GanFull Text:PDF
GTID:2428330626952098Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Semantic Web,RDF data is increasingly used in various scenarios,and its scale growing continuously.In order to better apply the value of these data and meet the demand of RDF large graph data,we need to design a distributed loading algorithm,storage scheme and query algorithm that can process RDF large graph data.In order to further improve the loading speed of RDF large graph data,a distributed RDF data loading algorithm is designed.The key part of the loading process is the process of coloring the predicate to confirm its storage location.Since the graph coloring problem is a classic NPC problem,we design the distributed graph coloring algorithm based on Pregel to accelerate this process.Considering that the relational database is mature and has many advantages,it is proposed to use the relation to store and manage the RDF data.Because RDF data has the characteristics of data sparse and schema variability in relational table,We design a distributed storage scheme based on relation to solve it,and also design dictionary index optimization scheme and partition optimization scheme to further improve storage performance and query performance.The above distributed graph coloring algorithm and distributed RDF data loading algorithm are implemented on Spark GraphX,and is used to load the data into the distributed data processing engine HAWQ.Experimental results on the synthetic dataset LUBM and the real dataset DBpedia show that the coloring time of the JP-Pregel algorithm and the LDF-Pregel algorithm was reduced by an average of 26.4% and 30.9%,respectively,over the MIS-Pregel algorithm.the storage scheme can reduce the storage and accelerate query.On the LUBM200,the uncompressed storage scheme can reduce the space by 16.4% compared with the original RDF data and optimization solutions can reduce space consumption by 29.3%.The compressed storage scheme consumes only 9.31% of the original RDF data.The experimental results show that the distributed storage scheme can improve the performance of storage space and query time.
Keywords/Search Tags:RDF, Large graph data, Storage scheme, Pregel, HAWQ
PDF Full Text Request
Related items