Research On Parallel Partitioning And Distributed Processing System Of Large-scale RDF Data

Posted on:2016-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:C F Xie

Full Text:PDF

GTID:2348330479453380

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Due to the flexibility and scalability of RDF(Resource Description Framework) data model, more and more communities have released their data in RDF format. Therefore,distributed storing and processing RDF data has been a hotspot. Though e xisting solutions have got certain achievement, most of them focused on the designing of distributed storage and optimization of processing, largely disregard ed balanced workload and minimal traffic.Hypergraph based parallel traversal tree partitioning and distributed processing system(ParTripleBit) presents a kind of technology which can partitioning and processing large-scale RDF data efficiently. It abstract RDF data with hypergraph model, then use traversal tree partitioning scheme to place the basic divisions into several co mpute nodes in parallel, which can maintain the relations between entities. In order to keep the data load balance and workload balance among compute nodes, triple placement strategy has been made. In addition, a heuristic scheme has been provided to decompose query tasks, which can minimize the decomposition. The async hronous and non-blocking communication model that MPI provided has been used in ParTripleBit, as well as a block level variable length integer delta compression scheme and parallel pipeline during interaction. In addition, a lock- free workstealing scheduler has been realized to schedule the query tasks. When collect the intermediate results, a batch merge operation has been realized to reduce the comparisons between keys.ParTriple Bit shows good performance while compared with five state-of-the-art RDF engines, including two centralized engines, TripleBit and RDF-3X, and three distributed engines, unone-on, dirtwo, and untwo-on. In partitioning, Par Triple Bit has several times time-saving in preprocessing, offer the minimum redundancy and best data load balance. In query processing, Par Triple Bit has a 40% performance improvement than three distributed engines, and several times even tens times performance improvement than two ce ntralized engines. In scalability, Par Triple Bit has a line or supline improvement in query processing while the compute nodes increasing, and a subline increase in execute time while the data size increasing. Thus Par Triple Bit has a good scalability.

Keywords/Search Tags:

Hypergraph Model, Traversal Tree Partitioning, Distributed Processing, Asynchronous Communicate

PDF Full Text Request

Related items

1	Design And Implementation Of Parallel Graph-partitioning And Hypergraph-partitioning Methods For OpenFOAM
2	Graph Partitioning Algorithms And Its Applications In Distributed Network Environment
3	Research Of Multilevel Hypergraph Partitioning Algorithms And Its Application In Large-scale Parallel CFD Computations
4	An Asynchronous Graph Computing System On GPU With Hybrid Coloring Algorithm
5	The Approximate Algorithm Of Hypergraph Embedding Problem
6	Research On Circuit Partitioning Methodology Based On Graph Algorithms
7	Design And Implementation Of A Graph Traversal Framework For Distributed Graph Storage
8	Research On Graph Partitioning Algorithms And The Applications In The Large-scale Numerical Parallel Computation
9	Tree Structured Data Processing On GPUs
10	Research On The Circuit Partitioning Algorithm Of Embryonic Bio-inspired Hardware