Font Size: a A A

Research And Implementation Of Large Scale Rule-Based Backward Chaining Reasoning For The Semantic Web

Posted on:2017-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:2428330485960809Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of semantic data in recent years,the forward chaining reasoning method,which is good at handling static semantic data,gradually exposes flaws.The forward chaining reasoning method needs to re-reasoning the data each time the data updates,in order to maintain the integrity of the reaults,which results in its low efficiency.Thus,the backward chaining reasoning method,which is insensitive to data updates,started to become a new research direction.Backward chaining reasoning is a goal driven method.It infers the results according to the given rule set when a query comes.Backward chaining reasoning is more complex than forward chaining reasoning.Besides,backward chaining reasoning happens in query time so that it's time overhead is bigger than pure query.This is the greatest obstacle preventing it from being a practical method.Nowdays,most of the existing backward chaining reasoning systems are in a subordinate position of a RDF storage and query system and their reasoning ability are relatively weak.Due to its complex reasoning procedure and deep searching space of rule extensions as well as hard to be parallelized,backward chaining reasoning failed to achieve efficient reasoning performance or high scalability.Based on the existing backward chaining reasoning techniques and the in-depth analysis of the semantic rule sets,we proposed efficient and scalable large-scale parallel backward chaining reasoning methods for both RDFS and OWL rule sets on top of Spark.The main research work of this thesis can be classified into three parts:First,we deeply analysed the procedure of backward chaining reasoning and its dependency to semantic data in different stages.And then we designed a strategy to compute terminological closure before real time reasoning.Semantic data is different from general web data,it comes with a domain-related ontology.Ontology data describes relationships among concepts in a specific domain.The size of ontology data is usually small and the rapid growth of semantic data increases instance data size but not the ontology data.In a rule set,a useful rule includes at least one ontology triple as its anticendents.So the extended patterns may have many duplicate terminological patterns and recomputing these duplicate patterns will consume a lot of time.We pre-calculate terminological closure and reuse the closure in real time reasoning stage.This method can reduce many duplicate patterns,and as a result,our method can reduce the reasoning time a lot.Second,we designed optimization methods separately in reverse reasoning procedure,querying procedure and forward reasoning procedure of a backward chaining reasoning precess.All the optimizations contribute to the preformance improment of backward chaining reasoning.In reverse reasoning procedure,we designed a strategy to cut the useless reasoning branchs according to data dependencies between different reasoning levels;we designed an optimized pattern selection function to determine a best executing order.In querying procedure,we designed an RDD based storage model and a strategy using pre-shuffle technique to skip unrelated data when doing global scanning.In forward reasoning procedure,we designed a binding propagation method and a free variable method to optimize full query patterns;we designed an optimization against redundancy reasoning patterns which reduces re-calculation and duplicate data;we designed an optimization against join operations which reduces I/O and network overhead.Finally,we designed and implemented our backward chaining reasoning algorithm on top of Spark.Spark is one of the most popular big data processing frameworks due to its good fault tolerance and high scalability and simple deployment.Our method based on Spark has very good versatility.Experimental results on both synthetic datasets and real-world datasets show that our method achieves several seconds to tens of seconds of reasoning time on large-scale datasets of hundreds of millions triples as well as high data scalability and node scalability.
Keywords/Search Tags:Semantic Web, Backward Chaining, Rule-Based Reasoning, RDFS, OWL, Spark
PDF Full Text Request
Related items