Research And Implementation Of Large Scale Rule-Based Reasoning For The Semantic Web

Posted on:2015-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:F F Wang

Full Text:PDF

GTID:2348330485990390

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, the Semantic Web has been widely used in various fields. The semantic data have been increasing with fast speed. The large scale semantic data contain a lot of complex implicit information that is of great significance to many semantic applications. However, the traditional reasoning engines were often designed for running on a single-node machine, thus they can hardly cope with such large amount of data due to software and hardware resource limitations. Nowdays, designing and implementing large scale parallel reasoning systems has attracted a lot of research enthusiasm.Many parallel reasoning engines have been proposed in the past several years, such as the reasoning engines based on DHTs, P2P network, MapReduce and so on. However, due to the reasoning process is a complex computing flow with many iterations and data reusing issues, these reasoning systems failed to achieve efficient execution performance or high scalability.To solve this problem, we proposed a series of parallel reasoning techniques and solutions based on the in-depth analysis of the semantic reasoning algorithms and the widely-used existing big data parallel processing platforms. We designed and implemented the new algorithms and systems for parallel semantic reasoning. From the types of reasoning rules, our research work and contributions can be classified into following two parts:1) The research of parallel RDFS reasoning. Firstly, we studied the widely-used RDFS inference and optimized the algorithm from three aspects, including data partition model, the execution order of reasoning rules and the removing of data duplications. Then we designed and implemented YARM (YARM:Yet Another Reasoning System with MapReduce), a parallel RDFS reasoning system with MapReduce, and PRRS (PRRS:A Parallel RDFS Reasoning System with Spark), a parallel RDFS reasoning system with Spark. Experimental results on large scale benchmark and real-world datasets show that YARM and PRRS achieves about 10 times faster than the fastest MapReduce-based reasoning engine (reasoning-hadoop) and also achieves better scalability.2) The research of parallel OWL reasoning. On the basis of the RDFS reasoning, we further studied the more powerful, more complex and more widely-used OWL Horst reasoning rule set. The OWL reasoning algorithm has more challenges in the parallelization. This is because it contains a variety of reasoning rules with more complex computation flows. It needs lots of iterations and thus requires a lot of data reusing and sharing. To solve these problems, we proposed PROS, a new parallel OWL reasoning algorithm that builds with the Spark RDD model. We designed four major optimizations in PROS:first, we adopted the broadcast and pre-shuffle strategies to optimize the join computation and reduce data communication overhead; second, we used the smart transitive closure method to effectively compute transitive closure; third, a unified nominal representation was chosen for equivalent resources to avoid exponential derivation in the "owl:sameAs" reasoning rule; forth, a new parallel reasoning algorithm on the Spark framework was designed and implemented based on the optimizations above. Experimental results on large scale benchmark and real-world datasets show that PROS is about 8 to 20 times faster than the fastest MapReduced-based reasoning engine and also achieves better scalability.

Keywords/Search Tags:

Semantic Web, RDFS reasoning, OWL reasoning, parallel reasoning, MapReduce parallel computing framework, Spark parallel computing framework

PDF Full Text Request

Related items

1	Research Of Large-scale RDF Graph Parallel Reasoning Method Based On The MapReduce
2	A Spark Based Semantic Reasoning Engine And Its Application
3	Research And Implementation Of Distributed RDF Data Parallel Reasoning Method
4	Research On Key Technology Of General Parallel Computing Framework In Multi-core Heterogeneous Environment
5	The Design Of Reasoning System Based On DL And Parallel Algorithm Research Of This System
6	EVIDENTIAL REASONING IN SEMANTIC NETWORKS: A FORMAL THEORY AND ITS PARALLEL IMPLEMENTATION (INHERITANCE, CATEGORIZATION, CONNECTIONISM, KNOWLEDGE REPRESENTATION)
7	Research Of Distributed Parallel Reasoning Method For Massive RDF Data
8	Research And Implement Of Reasoning Engine For Real Time IDES Based On CVE
9	Automated Reasoning Based On The Geometry Of Parallel Technology Owes Over Constraint To Determine The Study
10	Research On Apache Spark Distributed Parallel Computing Framework Optimization Technology