Font Size: a A A

Parallel Query Processing System On Large-scale RDF Data

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:G YangFull Text:PDF
GTID:2308330452957200Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The RDF (Resource Description Framework) data model was proposed for modelingWeb objects as part of developing the semantic web. It has been used in variousapplications, such as Wikipedia, grovement, biology information and so on. Middleweightof RDF datasets is exponentially. Now, the number of RDF datasets has exceeded onebillion triples and continues to grow significantly. Big explosion on the way RDF dataanalysis and processing of existing data presents serious challenges. Therefore, the designof an efficient RDF data query engine becomes an urgently problem people needs to solve.Parallel query processing system on large-scale RDF data (TripleParallel), proposesan efficient level of one billion RDF data processing techniques. This technique is basedon the characteristics of RDF data using RDF graph data structure for data abstraction. Inorder to speed SPARQL query processing statements, TripleParallel parallel processingmodel is based on block granularity. For inquiries planned production, the use ofselectivity estimation methods to determine the degree of each variable and select thequery graph binding patterns. In the block-grained approach, the establishment of aparallel processing model is units of blocks using data extraction and data manipulationseparately ways. And pipeline approach connects the two porcesses. The approachimproves the degree of parallelism while strengthening the overlapping data andcalculations and reduces the overall execution time of the query. In the block internalprocessing, TripleParallel presents a parallel processing join way. For different datamanipulation takes further optimized to improve the processing speed.TripleParallel good performance in the processing of block-grained and blockinternal, which makes the query processing is reduced by25%compared to TripleBit inquery time. On the one hand TripleParallel reduces from planning to execution plangeneration time and increases the compactness of the whole process. On the other hand, ituses a pipelined processing, and froms both block-grained and block internal acceleratedto achieve the degree of load of the processor, to improve the efficiency of concurrentexecution of different size.
Keywords/Search Tags:block level parallel processing, parallel processing model, load balancing, parallel join algorithms
PDF Full Text Request
Related items