Font Size: a A A

Research And Implementation Of Large Scale Rule-Based Reasoning For The Semantic Web

Posted on:2015-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:F F WangFull Text:PDF
GTID:2348330485990390Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the Semantic Web has been widely used in various fields. The semantic data have been increasing with fast speed. The large scale semantic data contain a lot of complex implicit information that is of great significance to many semantic applications. However, the traditional reasoning engines were often designed for running on a single-node machine, thus they can hardly cope with such large amount of data due to software and hardware resource limitations. Nowdays, designing and implementing large scale parallel reasoning systems has attracted a lot of research enthusiasm.Many parallel reasoning engines have been proposed in the past several years, such as the reasoning engines based on DHTs, P2P network, MapReduce and so on. However, due to the reasoning process is a complex computing flow with many iterations and data reusing issues, these reasoning systems failed to achieve efficient execution performance or high scalability.To solve this problem, we proposed a series of parallel reasoning techniques and solutions based on the in-depth analysis of the semantic reasoning algorithms and the widely-used existing big data parallel processing platforms. We designed and implemented the new algorithms and systems for parallel semantic reasoning. From the types of reasoning rules, our research work and contributions can be classified into following two parts:1) The research of parallel RDFS reasoning. Firstly, we studied the widely-used RDFS inference and optimized the algorithm from three aspects, including data partition model, the execution order of reasoning rules and the removing of data duplications. Then we designed and implemented YARM (YARM:Yet Another Reasoning System with MapReduce), a parallel RDFS reasoning system with MapReduce, and PRRS (PRRS:A Parallel RDFS Reasoning System with Spark), a parallel RDFS reasoning system with Spark. Experimental results on large scale benchmark and real-world datasets show that YARM and PRRS achieves about 10 times faster than the fastest MapReduce-based reasoning engine (reasoning-hadoop) and also achieves better scalability.2) The research of parallel OWL reasoning. On the basis of the RDFS reasoning, we further studied the more powerful, more complex and more widely-used OWL Horst reasoning rule set. The OWL reasoning algorithm has more challenges in the parallelization. This is because it contains a variety of reasoning rules with more complex computation flows. It needs lots of iterations and thus requires a lot of data reusing and sharing. To solve these problems, we proposed PROS, a new parallel OWL reasoning algorithm that builds with the Spark RDD model. We designed four major optimizations in PROS:first, we adopted the broadcast and pre-shuffle strategies to optimize the join computation and reduce data communication overhead; second, we used the smart transitive closure method to effectively compute transitive closure; third, a unified nominal representation was chosen for equivalent resources to avoid exponential derivation in the "owl:sameAs" reasoning rule; forth, a new parallel reasoning algorithm on the Spark framework was designed and implemented based on the optimizations above. Experimental results on large scale benchmark and real-world datasets show that PROS is about 8 to 20 times faster than the fastest MapReduced-based reasoning engine and also achieves better scalability.
Keywords/Search Tags:Semantic Web, RDFS reasoning, OWL reasoning, parallel reasoning, MapReduce parallel computing framework, Spark parallel computing framework
PDF Full Text Request
Related items